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Prologue 


We  develop  formal  techniques  that  give  users  flexibility  in  examining  design  errors  discovered 
by  automated  analysis.  We  build  our  results  using  the  model  checking  approach  to  verification.  The 
two  inputs  to  a  model  checker  arc  a  finite  system  model  and  a  formal  correctness  property  specifying 
acceptable  behaviors.  The  correctness  property  induces  a  bipartition  on  the  set  of  behaviors  of 
the  model:  correct  behaviors,  which  satisfy  the  property,  and  faulty  behaviors,  which  violate  the 
property.  Traditional  model  checkers  give  users  a  single  counterexample,  chosen  from  the  set  of 
faulty  behaviors.  Giving  the  user  access  to  the  entire  set,  however,  lets  him  have  more  control  over 
the  design  refinement  process.  The  focus  of  our  work  is  on  ways  of  generating,  presenting,  and 
analyzing  faulty  behavior  sets. 

We  present  our  results  in  two  pa  its.  In  Paid  I  we  introduce  concepts  that  let  us  define  faulty  be¬ 
havior  sets  as  failure  scenario  graphs.  We  then  describe  algorithms  for  generating  scenario  graphs. 
The  algorithms  use  model  checking  techniques  to  produce  faulty  behavior  sets  that  are  sound  and 
complete. 

In  Part  II  we  apply  our  formal  concepts  to  the  security  domain.  Building  on  the  foundation 
established  in  Paid  I,  we  define  and  analyze  attack  graphs ,  an  application  of  scenario  graphs  to 
represent  ways  in  which  intruders  attack  computer  networks.  This  application  of  formal  analysis 
contributes  to  techniques  and  tools  for  strengthening  network  security. 
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Part  I 


Scenario  Graphs 


Chapter  1 


Introduction 


For  the  past  twenty  years,  model  checking  has  been  used  successfully  in  many  engineering  projects. 
Model  checkers  assist  the  engineer  in  identifying  automatically  individual  design  flaws  in  a  system. 
A  typical  model  checker  takes  as  input  a  model  of  the  system  and  a  correctness  specification.  It 
checks  the  model  against  the  specification  for  erroneous  behavior.  If  erroneous  behavior  exists, 
the  model  checker  produces  an  example  that  helps  the  user  understand  and  address  the  problem. 
Once  the  problem  is  fixed,  the  user  can  repeat  the  process  until  the  model  satisfies  the  specification 
perfectly. 

In  some  situations  the  process  of  repeatedly  checking  for  and  fixing  individual  flaws  does  not 
work  well.  Sometimes  it  is  not  feasible  to  eliminate  every  undesirable  behavior.  For  instance, 
network  security  cannot  in  practice  be  made  perfect  due  to  a  combination  of  factors:  software  com¬ 
plexity,  desire  to  keep  up  with  the  latest  features,  expense  of  fixing  known  system  vulnerabilities, 
etc.  Despite  these  difficulties,  network  system  administrators  would  invest  the  time  and  resources 
necessary  to  find  and  prevent  the  most  destructive  intrusion  scenarios.  However,  the  “find  problem- 
fix  problem-repeat”  engineering  paradigm  inherent  in  traditional  uses  of  model  checkers  does  not 
make  it  easy  to  prioritize  the  problems  and  focus  the  limited  available  resources  on  the  most  pressing 
tasks. 

We  adapt  model  checking  to  situations  where  the  ideal  of  a  perfect  design  is  not  feasible.  A 
recently  proposed  formalism  appropriate  for  this  task  is  failure  scenario  graphs  [53].  Informally,  a 
failure  scenario  graph  is  a  succinct  representation  of  all  execution  paths  through  a  system  that  violate 
some  correctness  condition.  Scenario  graphs  show  everything  that  can  go  wrong  in  a  system,  leaving 
the  engineer  free  to  prioritize  the  problems  as  appropriate.  In  Paid  I  of  this  thesis  we  give  formal 
definitions  for  scenario  graphs  and  develop  algorithms  that  generate  scenario  graphs  automatically 
from  finite  models.  Paid  II  contains  a  detailed  discussion  of  a  specific  kind  of  scenario  graph: 
attack  graphs.  We  present  examples  where  the  system  administrator  uses  attack  graphs  to  assess  the 
network’s  exposure  to  attack  and  evaluates  effectiveness  of  various  security  fixes. 

In  the  appendices  we  briefly  address  further  avenues  of  research  that  can  take  advantage  of  the 
bird’s-eye  view  of  the  system  offered  by  scenario  graphs.  In  the  area  of  embedded  systems,  it  is 
frequently  impossible  to  predict  adverse  environmental  behavior  in  advance,  or  deal  perfectly  with 
those  situations  that  can  be  foreseen.  More  generally,  any  design  problem  involving  reactive  systems 
that  has  to  deal  with  an  unpredictable  environment  could  benefit  from  scenario  graph  analysis. 
Whenever  we  can  anticipate  the  environment  to  be  hostile  or  unpredictable,  designing  the  system  to 
cope  with  the  environment  becomes  a  priority. 

In  Appendix  A  we  treat  such  situations  as  games  between  the  system  and  the  environment. 
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CHAPTER  1.  INTRODUCTION 


where  the  environment  sometimes  makes  hostile  moves  and  the  system  must  find  an  effective 
counter-move.  Adopting  this  stance,  we  view  a  scenario  graph  as  a  paid  of  the  decision  graph 
of  the  game,  with  moves  by  the  environment  and  counter-moves  by  the  system.  Specifically,  the 
scenario  graph  is  equivalent  to  the  decision  subgraph  that  includes  all  scenarios  where  the  game  is 
won  by  the  environment. 

When  we  arc  modeling  a  system  that  operates  in  an  uncertain  environment,  certain  transitions 
in  the  model  represent  the  system's  reaction  to  changes  in  the  environment.  We  can  think  of  such 
transitions  as  being  outside  of  the  system's  control — they  occur  when  triggered  by  the  environment. 
When  no  empirical  information  is  available  about  the  relative  likelihood  of  such  environment-driven 
transitions,  we  can  model  them  only  as  nondeterministic  “choices”  made  by  the  environment.  How¬ 
ever,  sometimes  we  have  empirical  data  that  make  it  possible  to  assign  probabilities  to  environment- 
driven  transitions.  In  Appendix  B  we  show  how  to  incorporate  probabilities  into  scenario  graphs 
and  demonstrate  a  correspondence  between  the  resulting  construction  and  Markov  Decision  Pro¬ 
cesses  [3]  (MDPs).  The  correspondence  lets  us  take  advantage  of  the  existing  body  of  algorithms 
for  analyzing  MDPs. 

Traditional  automata-theoretic  model  checkers  accept  correctness  properties  written  in  temporal 
logic  and  convert  them  into  automata.  Existing  converstion  algorithms  work  with  automata  accept¬ 
ing  infinite  executions  only.  Since  our  techniques  arc  designed  to  work  with  both  finite  and  infinite 
paths,  in  Appendix  C  we  sketch  a  modification  to  the  classic  algorithm  for  converting  LTL  formu¬ 
las  [39]  to  Biichi  automata.  The  modified  algorithm  outputs  Biichi  automata  that  accept  both  finite 
and  infinite  executions. 

The  rest  of  Paid  I  is  organized  as  follows.  In  Chapter  2  we  define  scenario  graphs  and  the 
associated  terminology  used  throughout  the  thesis.  Chapter  3  presents  algorithms  for  generating 
scenario  graphs  from  finite  models.  In  Chapter  4  we  measure  performance  of  our  scenario  graph 
generator  and  evaluate  some  algorithmic  adjustments  that  trade  off  running  time  and  completeness 
guarantees  for  memory  requirements. 


Chapter  2 

Scenario  Graphs:  Definitions 


In  this  chapter  we  present  definitions  and  terminology.  Our  first  task  is  to  define  data  structures 
to  represent  all  failures  in  a  system.  We  call  such  structures  failure  scenario  graphs ,  or  scenario 
graphs  for  brevity. 


2.1  Scenario  Graphs 

Broadly,  system  requirements  (also  called  correctness  properties )  can  be  classified  into  two  cate¬ 
gories,  safety  and  liveness  properties.  Safety  properties  state  that  nothing  bad  ever  happens  in  the 
system.  For  example:  an  intruder  never  gets  user  or  administrative  privileges  on  the  network;  a  sin¬ 
gle  component  failure  does  not  bring  the  entire  system  down;  the  controller  does  not  allow  the  boiler 
temperature  to  rise  above  a  certain  level.  Safety  properties  can  be  expressed  as  an  invariant  stating 
that  the  system  never  enters  an  unsafe  state.  Informally,  a  scenario  graph  representing  violations  of  a 
safety  property  incorporates  all  the  executions  of  a  system  that  lead  to  an  unsafe  state.  We  can  view 
a  scenario  graph  as  a  directed  graph  where  edges  represent  state  transitions  and  vertices  represent 
system  states.  Certain  states  arc  labeled  unsafe.  Each  path  from  an  initial  state  to  an  unsafe  state  is 
an  execution  of  the  system  violating  the  safety  property. 

Safety  properties  alone  arc  not  enough  to  specify  correct  behavior  in  interesting  systems — a 
system  with  an  empty  behavior  set  is  safe,  but  not  very  useful.  We  use  liveness  requirements  to 
specify  the  active  tasks  the  system  is  designed  to  do.  For  example,  a  banking  system  might  have  a 
liveness  requirement  stating  that  if  a  check  is  deposited  and  sufficient  funds  arc  available,  the  check 
eventually  clears.  A  violation  of  a  liveness  requirement  occurs  when  the  system  gets  stuck,  either 
by  stopping  completely  or  going  around  behavioral  cycles  that  do  not  accomplish  the  tasks  specified 
in  the  liveness  requirement.  To  represent  liveness  violations,  both  the  modeling  language  and  the 
scenario  graph  formalism  must  support  infinite  executions. 

To  account  for  both  types  of  correctness  properties,  we  define  scenario  graphs  with  respect  to 
models  that  admit  both  finite  and  infinite  executions.  A  simple  formalism  capable  of  describing 
such  models  is  the  Biichi  automaton. 

2.1.1  Biichi  Automata 

Traditional  Biichi  automata  are  nondeterministic  finite  automata  equipped  with  an  acceptance  con¬ 
dition  that  works  for  infinite  words  (cu- words):  an  tu-word  is  accepted  if  and  only  if  the  automaton 
can  read  the  word  from  left  to  right  while  visiting  a  sequence  of  states  in  which  an  acceptance  state 
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regular  state 
initial  state 
acceptance  state 
final  state 

acceptance  +  final  state 


Figure  2.1:  Biichi  Automaton 

occurs  infinitely  often.  This  is  called  the  Biichi  acceptance  condition.  Since  we  wish  to  include 
finite  executions,  we  extend  the  set  of  executions  accepted  by  Biichi  automata  to  finite  executions 
whose  last  state  is  a  final  state  in  the  automaton. 

Hereafter,  we  will  work  exclusively  with  Biichi  automata  that  accept  words  over  the  alphabet 
A  =  2AP ,  where  AP  is  a  set  of  atomic  propositions.  A  word  in  the  alphabet — that  is,  a  member  of 
AC  U  A*,  is  called  a  propositional  sequence. 

Definition  2.1  Given  a  set  of  atomic  propositions  AP,  a  Biichi  automaton  over  the  alphabet  A  = 
2ap  is  a  6-tuple  B  =  ( S ,  t,  So,  Sa,  Sf,  D )  with  finite  state  set  S,  transition  relation  r  C  S  X  S, 
initial  state  set  So  C  S.  a  set  Sa  Q  S  of  acceptance  states,  a  set  Sf  C  S  of  final  states,  and  a 
labeling  I)  :  S  — ►  2A  of  states  with  sets  of  letters  from  A. 

Definition  2.2  An  infinite  execution  of  a  Biichi  automaton  B  =  (S,r,  So,  Sa,  Sf,  D)  on  an  infinite 
propositional  sequence  £  =  IqI  i  . . .  ln  . . .  is  a  sequence  of  states  a  =  SoSi  . . .  sn  . . .  such  that 
so  E  So  and  for  all  i  >  0,  (sj,  Sj+i)  E  r  and  l,  E  D(si).  We  say  that  the  execution  a  accepts  the 
propositional  sequence  £. 

An  infinite  execution  a  is  successful  if  some  acceptance  state  sa  E  Sa  occurs  infinitely  often  in 
a.  B  accepts  propositional  sequence  £  if  there  exists  a  successful  execution  of  B  that  accepts  £. 


Definition  2.3  A  finite  execution  of  a  Biichi  automaton  B  =  (S,  t,  Sq,  Sa,  Sf,  D)  on  a  finite  propo¬ 
sitional  sequence  £  =  IqI\  . .  .ln  is  a  sequence  of  states  a  =  So^i . . .  sn  such  that  so  E  So  and  for 
all  0  <  i  <  n,  ( Si ,  Si+i)  E  r  and  lt  E  D(si).  We  say  that  the  execution  a  accepts  the  propositional 
sequence  f. 

A  finite  execution  a  is  successful  if  sn  E  Sf.  B  accepts  propositional  sequence  £  if  there  exists 
a  successful  execution  of  B  that  accepts  f. 

It  is  often  convenient  to  name  a  transition  (s,  s')  E  r  with  a  label:  t  =  (s,  s').  Using  this  convention, 
we  will  sometimes  include  transition  labels  explicitly  when  writing  out  an  execution,  e.g.  a  = 

sotosi  . . .  ti-iSi . . .,  where  for  alH  >  0  t i  =  (sj,  s;+i). 
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Definition  2.4  The  set 


C{B)  =  {a  €  AC  U  A*  |  B  accepts  a} 
is  the  language  recognized  by  Biichi  automaton  B. 

Figure  2.1  is  an  example  of  a  Biichi  automaton.  The  set  of  acceptance  states  Sa  is  marked  with 
double  circles.  The  language  recognized  by  the  automaton  includes  any  infinite  sequence  that  goes 
through  any  of  the  three  acceptance  states  infinitely  often.  An  acceptance  state  may  or  may  not  also 
be  an  final  state.  In  Figure  2. 1  the  final  states  arc  shaded.  Any  finite  sequence  of  states  that  begins 
in  an  initial  state  and  ends  in  one  of  the  final  (shaded)  states  is  accepted  by  the  automaton. 


2.1.2  Generalized  Biichi  Automata 

Sometimes  it  is  convenient  to  work  with  a  Biichi  automaton  with  several  accepting  sets,  even  though 
this  addition  does  not  extend  the  expressive  power  of  the  Biichi  formalism.  In  particular,  algorithms 
presented  in  Chapter  3  convert  Linear  Temporal  Logic  formulas  into  generalized  B iichi  automata. 
Instead  of  a  single  acceptance  set  Sa,  a  generalized  Biichi  automaton  has  a  set  of  acceptance  sets 
T  C  2s.  The  automaton  accepts  an  execution  a  if  and  only  if  a  visits  each  acceptance  set  F)  £  T 
infinitely  often.  There  is  a  simple  translation  of  a  generalized  Biichi  automaton  to  an  equivalent 
Biichi  automaton  [19],  which  we  omit  here. 


2.1.3  Biichi  Models 

We  say  that  a  Biichi  automaton  M  is  a  Biichi  model  of  system  J  if  the  language  L(M)  is  a  subset  of 
the  possible  behaviors  of  J.  Hereafter,  we  shall  consider  Biichi  models  over  the  alphabet  A  =  2AP, 
where  the  set  of  atomic  propositions  AP  describes  the  states  of  the  system  that  we  would  like  to 
model.  For  each  state  s  in  a  Biichi  model,  the  label  D(s)  is  a  singleton  set  containing  one  letter  of 
A.  The  letter  identifies  a  particular  state  in  the  modeled  system  J. 

A  scenario  is  a  synonym  for  a  word  in  the  alphabet — that  is,  a  propositional  sequence  in  A^'UA* . 
A  scenario  describes  a  sequence  of  changes  in  the  system  state.  The  alphabet  thus  serves  as  the 
formal  link  between  the  system  J  and  the  Biichi  automaton  that  models  it.  We  distinguish  between 
finite  scenarios  (members  of  A*)  and  infinite  scenarios  (members  of  Au). 

Since  each  state  of  a  Biichi  model  M  is  labeled  by  exactly  one  letter  of  A,  each  execution 
a  =  sosi . . .  of  M  always  accepts  exactly  one  scenario  r/  =  D(so)D(si) . . ..  When  the  context 
is  clear,  we  sometimes  use  the  term  scenario  to  refer  directly  to  an  execution  of  M  instead  of  the 
corresponding  propositional  sequence. 


2.1.4  Scenario  Graphs 

Given  a  Biichi  model  M  over  the  alphabet  2AP,  a  scenario  graph  is  a  subset  of  L(M),  i.e.  a  collec¬ 
tion  of  scenarios  in  the  model.  Most  such  collections  arc  uninteresting;  we  care  particularly  about 
scenario  graphs  that  describe  erroneous  system  behavior  with  respect  to  some  correctness  condition. 
Scenarios  can  be  finite  or  infinite,  so  we  use  Biichi  automata  to  represent  both  the  model  and  the 
scenario  graph.  We  refer  to  Biichi  automata  representing  scenario  graphs  as  scenario  automata. 
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Definition  2.5  Given  a  Btichi  model  over  a  set  of  atomic  propositions  AP  and  alphabet  A  =  2AP, 
a  correctness  property  P  is  a  subset  of  the  set  of  scenarios  A^  U  A*.  A  scenario  a  G  Au  U  A*  is 
correct  with  respect  to  P  iff  a  G  P.  A  scenario  a  is  failing  with  respect  to  P  (violates  P)  iff  a  (fi  P. 

We  say  that  a  Biichi  model  M  satisfies  a  correctness  property  P  if  it  does  not  accept  any  failing 
scenarios  (that  is,  if  L{M )  C  P).  If,  however,  M  does  accept  some  failing  scenarios,  we  say  that 
the  set  of  such  scenarios,  L(M)\P,  is  the  failure  scenario  graph  of  M  with  respect  to  P. 

Traditional  model  checking  techniques  focus  on  producing  one  member  of  a  failure  scenario 
graph  as  a  counterexample.  In  Chapter  3  we  present  algorithms  that  generate  a  full  scenario  automa¬ 
ton  for  a  Biichi  model  M  and  correctness  property  P.  The  algorithms  guarantee  that  the  generated 
scenario  automaton  accepts  every  failing  scenario,  yet  does  not  contain  any  extraneous  states  and 
transitions. 

Definition  2.6  Given  a  Biichi  model  M  and  a  correctness  property  P,  a  scenario  automaton  M p  = 
(S,t,  So,  Sa,  Sf,  D)  is  sound  if  every  scenario  accepted  by  Mp  is  a  failing  scenario  of  M  with 
respect  to  P:  L(Mp)  C  L(M)\P. 


Definition  2.7  Given  a  Biichi  model  M  and  a  correctness  properly  P,  a  scenario  automaton  M p  = 
(S,  r,  ,S'o,  Sa.  Sj.  D)  is  exhaustive  if  it  accepts  every  failing  scenario  of  M  with  respect  to  P: 
L(MP)  D  L(M)\P. 


Definition  2.8  Given  a  Biichi  model  M  and  a  correctness  property  P,  a  scenario  automaton  M p  = 
(S,  r,  S0,  Sa,  Sf,  D )  is  succinct  if  the  following  two  conditions  hold: 

(1)  For  every  state  s  G  S  there  exists  a  failing  scenario  a  G  L(M)\P  containing  s 

(2)  For  every  transition  t  G  r  there  exists  a  failing  scenario  a  G  L(M)\P  containing  t. 

2.1.5  Examples 

The  structure  of  a  scenario  automaton  depends  on  the  structure  of  the  model  and  the  type  of  the 
correctness  property.  Figure  2.2  shows  three  different  examples.  In  the  first  example  (Figure  2.2(a)) 
the  scenario  automaton  represents  violations  of  a  safety  property  (“6  is  never  true”)  in  a  model 
without  cycles.  The  scenario  automaton  is  a  directed  acyclic  graph,  with  the  final  states  shaded.  In 
the  second  example  (Figure  2.2(b))  the  underlying  model  contains  cycles,  and  the  resulting  scenario 
automaton  is  a  directed  graph  with  cycles. 

Finally,  a  model  with  cycles  and  a  liveness  property  (“Eventually  b  is  true”)  produce  a  general 
Biichi  scenario  automaton  (Figure  2.2(c)).  The  shaded  nodes  in  (c)  represent  cycles  in  which  b  never 
becomes  true.  Automaton  executions  that  reach  those  cycles  along  a  path  where  b  is  always  false 
violate  the  liveness  property.  The  set  of  such  executions  is  the  set  of  counterexamples  to  the  liveness 
property. 

2.1.6  Representative  Scenarios 

Scenario  automata  representing  all  failing  scenarios  can  be  quite  large.  Sometimes  it  is  possible  to 
represent  the  failing  scenarios  in  a  more  compact  way.  We  observe  that  a  Biichi  model  may,  after 
a  finite  number  of  executions  steps,  find  itself  in  a  state  where  every  possible  continuation  leads  to 
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□  -*  □  -b 


Figure  2.2:  Scenario  Automata  Examples 


a  violation  of  the  correctness  property.  The  simplest  example  is  a  finite  execution  a  ending  in  a 
state  where  some  proposition  q  is  true.  The  execution  a  is  a  valid  scenario  violating  the  invariant 
correctness  property  G ->q,  and  so  is  every  finite  or  infinite  execution  with  the  prefix  a.  In  practice, 
including  all  of  these  executions  in  a  scenario  graph  is  counter-productive.  Instead,  we  can  include 
only  the  execution  a  and  designate  it  as  a  representative  for  all  other  executions  that  have  the  prefix 
a. 

Figure  2.3(a)  illustrates  a  part  of  a  Biichi  automaton  where  a  set  of  finite  executions  can  be 
replaced  with  a  representative  prefix.  Consider  a  finite  prefix  ending  in  the  state  indicated  by  the 
arrow.  However  we  choose  to  extend  the  prefix  into  a  finite  scenario,  the  scenario  will  be  accepted 
by  the  automaton.  Thus,  it  is  possible  to  replace  that  part  of  the  automaton  with  a  single  state  and 
designate  scenarios  ending  in  this  state  as  representative  (Figure  2.3(b)). 

Fess  obviously,  this  observation  also  applies  to  situations  where  all  infinite  executions  with  a 
certain  finite  prefix  violate  a  liveness  property.  Consider  the  FTF  correctness  property 

G  F  DeviceEncibled 

stating  that  some  device  must  be  enabled  infinitely  often  in  every  execution.  Only  infinite  executions 
could  properly  be  considered  as  scenarios  violating  this  property.  However,  if  a  certain  finite  exe¬ 
cution  a  reaches  a  state  s  from  which  it  is  not  possible  to  reach  another  state  where  DeviceEnabled 
holds,  then  we  know  that  any  infinite  scenario  with  the  prefix  a  is  a  violation  of  the  correctness 
property,  a  could  therefore  serve  as  a  representative  for  an  entire  set  of  infinite  scenarios  in  the 
scenario  graph,  reducing  graph  clutter  and  size  without  losing  interesting  information. 

Figure  2.4(a)  illustrates  a  part  of  a  Biichi  automaton  where  a  set  of  infinite  executions  can  be 
replaced  with  a  (finite)  representative  prefix.  Consider  a  finite  prefix  ending  in  the  state  indicated 
by  the  arrow.  However  we  choose  to  extend  the  prefix  into  an  infinite  scenario,  the  scenario  will  be 
accepted  by  the  automaton.  Thus,  it  is  possible  to  replace  that  part  of  the  automaton  with  a  single 
state  and  designate  scenarios  ending  in  this  state  as  representative  (Figure  2.4(b)). 

The  key  to  constructing  representatives  for  finite  scenarios  is  picking  out  short  executions  that 
prefix  a  class  of  (longer)  failing  executions. 
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f-representative 

scenario 


(a)  (b) 

Figure  2.3:  Representative  Scenario  for  Finite  Executions 
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i-representative 

scenario 


(a) 


(b) 


Figure  2.4:  Representative  Scenario  for  Infinite  Executions 
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Definition  2.9  Given  a  Biichi  automaton  M  =  ( S ,  r,  So,  Sa,  Sf,  D )  and  a  correctness  property  P, 
a  finite  execution  7  in  M  is  f-representative  with  respect  to  P  if  every  finite  execution  a  that  has 
prefix  7  is  a  failing  execution  of  M  with  respect  to  P. 


Definition  2.10  Given  a  Biichi  automaton  M  =  ( S ,  t,  So,  Sa  ■  Sf,  D ),  a  correctness  property  P, 
and  a  finite  execution  a  in  M,  we  say  that  a  finite  execution  7  is  an  f-prefix  of  a  with  respect  to  P  if 
( 1)  7  if  a  prefix  of  a  and  (2)  7  is  f-representative  with  respect  to  P.  An  fn -prefix  of  a  is  an  f-prefix 
of  size  n. 


From  here  on,  whenever  the  correctness  property  P  is  clear  from  context,  we  shall  refer  to  f- 
representative  scenarios  and  f-prefixes  without  mentioning  P  explicitly. 


Definition  2.11  The  shortest  f-prefix  7  of  a  is  the  characteristic  f-prefix  of  a. 


Definition  2.12  Two  finite  executions  a  \  and  ct  2  are  /n-equivalent  (designated  =fn,  n  G  N)  if  they 
share  an  identical  fn-prefix.  The  notation  [cr]=/n  designates  the  fn -equivalence  class  of  execution  a 
with  respect  to  the  relation  =fn.  The  fn-prefix  of  a  is  also  the  fn -prefix  of  the  fn- equivalence  class 


A  similar  set  of  special  finite  prefixes  can  be  identified  for  infinite  executions. 


Definition  2.13  Given  a  Biichi  automaton  M  =  (S,r,  Sq,  Sa,  Sf,  D)  and  a  correctness  property 
P,  a  finite  execution  7  in  M  is  i-representative  with  respect  to  P  if  every  infinite  execution  j3  that 
has  prefix  7  is  a  failing  execution  of  M  with  respect  to  P. 


Definition  2.14  Given  a  Biichi  automaton  M  =  ( S ,  t,  Sq,  Sa,  Sf,  D),  a  correctness  property  P, 
and  an  infinite  execution  (3  in  M,  we  say  that  a  finite  execution  7  is  an  i-prefix  of  (3  with  respect 
to  P  if  (1)  7  if  a  prefix  of  (3  and  (2)  7  is  i-representative  with  respect  to  P.  An  in- prefix  of  (3  is  an 
i-prefix  of  size  n. 


From  here  on,  whenever  the  correctness  property  P  is  clear  from  context,  we  shall  refer  to  i- 
representative  scenarios  and  i-prefixes  without  mentioning  P  explicitly. 


Definition  2.15  The  shortest  i-prefix  7  of  3  is  the  characteristic  i-prefix  of  (3. 


Definition  2.16  Two  infinite  executions  (3\  and  ff  are  in-equivalent  (designated  )  if  they  share 
an  identical  in-prefix  7.  The  notation  [0\=i  designates  the  ^-equivalence  class  of  execution  [3  with 
respect  to  the  relation  =,n.  The  in-prefix  of  (3  is  also  the  i„ -prefix  of  the  inequivalence  class  \J3\=i  . 
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2.1.7  Representative  Scenario  Automata 

When  appropriate,  we  replace  full  scenario  automata  with  automata  that  accept  an  f-prefix  or  an 
i-prefix  execution  of  every  failing  scenario.  Each  such  execution  a  then  serves  as  a  representative 
for  the  entire  equivalence  class  of  failing  scenarios  that  have  the  prefix  a.  Omitting  these  other 
scenarios  from  the  scenario  graph  reduces  clutter  and  size. 

Definition  2.17  Given  a  BUchi  model  M  and  a  correctness  property  P,  a  representative  scenario 
automaton  (or  rs-automaton/or  short)  is  a  triple  Mrp  =  (Mr,  Srp  Srj),  where  Mr  =  (S,t,  Sq,  Sa,  Sf, 
D)  is  a  BUchi  automaton,  Srpand  Srj  are  subsets  of  Sf,  and  executions  in  the  language  L(Mr)  fall 
into  three  classes: 

1.  Any  finite  execution  a  £  L(Mr )  whose  final  state  is  in  Srf  is  an  f-prefix  of  some  failing 
execution  in  M.  We  call  such  executions  f-scenarios. 

2.  Any  finite  execution  7  £  L(Mr )  whose  final  state  is  in  Srj  is  an  i-prefix  of  some  infinite  failing 
execution  in  M.  We  such  executions  i-scenarios. 

3.  Any  execution  in  L(Mr )  that  does  not  belong  to  the  other  two  classes  is  a  failing  execution  in 
M.  We  call  such  executions  r-scenarios. 

In  an  rs-automaton  Mrp  =  (Mr,  Srf,  Sri )  each  f-scenario  a  serves  as  a  representative  for  the  entire 
equivalence  class  [cr]=|a|  of  finite  failing  scenarios  of  M.  Similarly,  each  i-scenario  7  serves  as  a 
representative  for  the  entire  equivalence  class  [7]=  (  of  infinite  failing  scenarios  of  M.  Thus,  it 
is  understood  that  every  finite  execution  of  M  with  an  f-scenario  prefix  in  Mrp  and  every  infinite 
execution  of  M  with  an  i-scenario  prefix  in  Mfi  is  a  failing  scenario  of  M  with  respect  to  property 
P. 

Ideally,  an  rs-automaton  would  accept  only  the  characteristic  (that  is,  shortest)  prefix  of  every 
full  failing  scenario.  However,  it  may  not  be  efficient  to  look  for  the  shortest  prefix  in  every  case. 
Furthermore,  it  may  not  be  feasible  or  desirable  to  replace  every  failing  scenario  of  M  with  its 
representative  prefix.  So  the  definition  of  rs-automata  leaves  us  room  to  include  f-  and  i-prefixes  of 
any  size,  as  well  as  full  failing  scenarios. 

Clearly,  rs-automata  have  weaker  soundness  and  exhaustiveness  guarantees  than  regular  sce¬ 
nario  automata.  Definitions  2.18  and  2.19  specify  new,  weaker  soundness  and  exhaustiveness  con¬ 
ditions  that  hold  for  representative  scenario  automata. 

Definition  2.18  Given  a  BUchi  model  M  and  a  correctness  property  P,  an  rs-automaton  Mp  = 
(Mr,  Srf ,  Sri)  is  sound  if  for  every  scenario  7  accepted  by  Mp  one  of  the  following  holds: 

1.  7  is  failing  scenario  of  M  with  respect  to  P,  or 

2.  7  is  an  f-prefix  of  some  finite  failing  scenario  a  of  M,  or 

3.  7  is  an  i-prefix  of  some  infinite  failing  scenario  /3  of  M 


Definition  2.19  Given  a  BUchi  model  M  and  a  correctness  property  P,  an  rs-automaton  Mp  = 
(Mr,  Srf,  S,i)  is  exhaustive  if  for  every  failing  scenario  7  ofM  with  respect  to  P  one  of  the  following 
holds: 

1.  7  is  an  r-scenario  in  Mp,  or 

2.  anfn-prefix  of  y  is  an  f-scenario  in  Mpfor  some  value  ofn,  or 

3.  an  in-prefix  of  7  is  an  i-scenario  in  Mfifor  some  value  ofn 
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The  succinctness  condition  for  rs-automata  is  identical  to  the  succinctness  condition  of  regular  sce¬ 
nario  automata. 

Occasionally  it  is  useful  to  put  finite  and  infinite  scenarios  into  separate  rs-automata.  For  such 
cases  we  define  exhaustiveness  conditions  that  treat  finite  and  infinite  scenarios  separately. 

Definition  2.20  Given  a  Biichi  model  M  and  a  correctness  property  P.  an  rs-automaton  Mp  = 
( Mr ,  Sff,  Stf)  is  f-exhaustive  if  for  every  finite  failing  scenario  7  of  M  with  respect  to  P  one  of  the 
following  holds: 

1.  7  is  an  r-scenario  in  Mp,  or 

2.  anfn-prefix  of  7  is  an  f-scenario  in  Mfifor  some  value  ofn 


Definition  2.21  Given  a  Biichi  model  M  and  a  correctness  property  P,  an  rs-automaton  Mfi  = 
( Mr ,  Sri)  is  i-exhaustive  if  for  every  infinite  failing  scenario  7  of  M  with  respect  to  P  one  of 

the  following  holds: 

1.  7  is  an  r-scenario  in  Mp,  or 

2.  an  in-prefix  of  y  is  an  i-scenario  in  Mpfor  some  value  ofn 

2.1.8  Safety  Scenario  Graphs 

In  general,  we  require  the  full  power  of  Biichi  automata  to  represent  arbitrary  scenario  graphs. 
Safety  properties  arc  a  particular-  kind  of  correctness  property  that  admit  simpler  representations 
for  corresponding  scenario  graphs.  A  safety  property  P  is  associated  with  a  characteristic  logical 
predicate  Q />  :  2AP  — ►  bool  specifying  a  set  of  states  (from  2AP)  that  are  not  allowed  to  appear-  in 
any  scenario  in  P.  Any  infinite  scenario  that  fails  with  respect  to  P  must  have  a  finite  prefix  ending 
in  a  state  s  where  Qp(s)  is  false.  In  practice,  the  set  of  all  such  prefixes  is  usually  a  sufficient 
substitute  for  the  full  scenario  graph.  When  this  is  the  case,  we  can  represent  the  graph  with  a 
simple  rs-automaton. 

Definition  2.22  Given  a  Biichi  model  M  and  a  safety  property  P,  a  safety  scenario  graph  is  an 
rs-automaton  Mp  =  ( Mr ,  Srf,  Srj)  where 

Mr  =  (S,r,  So,  0,  Sf,  D) 

Srf  =  Sf 

Sif  =  Sf 

Safety  scenario  graphs  accept  only  fn-  and  in-prefixes  of  failing  scenarios  of  M.  They  do  not  accept 
any  infinite  scenarios.  Therefore,  in  a  safety  scenario  graph  the  set  of  acceptance  states  Sa  is  always 
empty  and  sets  Sry  and  Srj  are  always  equal  to  Sf.  Whenever  context  permits,  we  will  omit  those 
components  and  specify  safety  scenario  graphs  as  a  5-tuple  ( S ,  r,  So,  Sf,  D). 
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Chapter  3 


Algorithms 


We  turn  our  attention  now  to  algorithms  for  generating  scenario  automata  automatically.  Starting 
with  a  description  of  a  model  M  and  correctness  property  P.  the  task  is  to  construct  a  scenario 
automaton  representing  all  executions  of  M  that  violate  P.  It  is  essential  that  the  automata  produced 
by  the  algorithms  be  sound,  exhaustive  and  succinct. 


3.1  Background:  Temporal  Logic 

Temporal  logic  is  a  formalism  for  describing  sequences  of  transitions  between  states  in  a  reactive 
system.  In  the  temporal  logics  that  we  will  consider,  time  is  not  mentioned  explicitly;  instead,  a 
formula  might  specify  that  eventually  some  designated  state  is  reached,  or  that  an  error  state  is 
never  entered.  Properties  like  eventually  or  never  arc  specified  using  special  temporal  operators. 
These  operators  can  also  be  combined  with  boolean  connectives  or  nested  arbitrarily.  Temporal 
logics  differ  in  the  operators  that  they  provide  and  the  semantics  of  those  operators.  We  will  cover 
the  temporal  logic  CTL*  and  its  sub-logic  LTL. 

3.1.1  The  Computation  Tree  Logic  CTL* 

Conceptually,  CTL*  formulas  describe  properties  of  computation  trees.  The  tree  is  formed  by  des¬ 
ignating  a  state  in  a  Biichi  model  as  the  initial  state  and  then  unwinding  the  model  into  an  infinite 
tree  with  the  designated  state  as  the  root.  The  computation  tree  shows  all  of  the  possible  executions 
starting  from  the  initial  state. 

In  CTL*  formulas  arc  composed  of  path  quantifiers  and  temporal  operators.  The  path  quan¬ 
tifiers  arc  used  to  describe  the  branching  structure  in  the  computation  tree.  There  arc  two  such 
quantifiers:  A  (for  “all  computation  paths”)  and  E  (for  “some  computation  path”).  These  quantifiers 
are  used  in  a  particular  state  to  specify  that  all  of  the  paths  or  some  of  the  paths  starting  at  that  state 
have  some  property.  The  temporal  operators  describe  properties  of  a  path  through  the  tree.  There 
are  five  basic  operators: 

•  The  X  (“next  time”)  operator  requires  that  a  property  holds  in  the  second  state  of  the  path. 

•  The  F  (“eventually”)  operator  is  used  to  assert  that  a  property  will  hold  at  some  state  on  the 
path. 
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•  The  G  (“always”  or  “globally”)  operator  specifies  that  a  property  holds  at  every  state  on  the 
path. 

•  The  U  (“until”)  operator  is  used  to  combine  two  properties.  It  holds  if  there  is  a  state  on 
the  path  where  the  second  property  holds,  and  at  every  preceding  state  on  the  path  the  first 
property  holds. 

•  The  R  (“release”)  operator  is  the  logical  dual  of  the  U  operator.  It  requires  that  the  second 
property  holds  along  the  path  up  to  and  including  the  first  state  where  the  first  property  holds. 
However,  the  first  property  is  not  required  to  hold  eventually. 

There  arc  two  types  of  formulas  in  CTL* :  state  formulas  (which  arc  true  in  a  specific  state)  and  path 
formulas  (which  arc  true  along  a  specific  path).  Let  AP  be  a  set  of  atomic  propositions.  The  syntax 
of  state  formulas  is  given  inductively  by  the  following  rules: 

•  If  p  6  AP,  then  p  is  a  state  formula. 

•  If  /  and  g  are  state  formulas,  then  ->/,  /  V  g,  and  /  A  g  are  state  formulas. 

•  If  /  is  a  path  formula,  then  E  /  and  A  /  are  state  formulas. 

Two  additional  rules  specify  the  syntax  of  path  formulas: 

•  If  /  is  a  state  formula,  then  /  is  also  a  path  formula. 

•  If  /  and  g  arc  path  formulas,  then  — ■/,  /  V  <7,  /  A  <7,  X  /,  F  f  ,G  f ,  f  \]  g,  and  /  R  g  arc  path 
formulas. 

CTL*  is  the  set  of  state  formulas  generated  by  the  above  rules. 

We  define  the  semantics  of  CTL*  with  respect  to  a  Biichi  model  M  =  (S,  r,  So,  Sa,  Sf ,  D).  A 
path  in  M  is  a  sequence  of  states  and  transitions  a  =  sot-o  ■  ■  ■  tn~isn  ■  ■  ■  such  that  for  every  i  >  0, 
( Si,ti ,  Sj+i)  e  t.  A  path  can  be  finite  or  infinite.  We  allow  paths  of  length  zero,  denoted  a  =  [  ]. 
Sometimes  we  will  omit  the  transitions  and  specify  paths  as  a  =  so  ■  ■  ■  sn  . . ..  We  use  a1  to  denote 
the  suffix  of  a  stalling  at  st.  The  length  of  a  finite  path  a,  denoted  | a |,  is  the  number  of  states  in  the 
sequence  a.  Conventionally,  we  say  that  the  length  of  an  infinite  path  is  oo. 

If  /  is  a  CTL*  state  formula,  the  notation  M,  s  \=  f  means  that  /  holds  at  state  s  in  the  Biichi 
model  M.  Similarly,  if  /  is  a  path  formula,  M,  a  \=  f  means  that  /  holds  along  the  path  a  in  the 
Biichi  model  M.  When  the  Biichi  model  M  is  clear  from  context,  we  will  usually  omit  it.  The 
relation  |=  is  defined  inductively  below.  In  the  following  definition,  p  is  an  atomic  proposition,  /, 
/i,  and  f'2  arc  state  formulas,  p  and  'ip  arc  path  formulas,  and  a  is  any  path. 


1. 

M ,  s  \=  p 

P  G  Uze-D(s)  l- 

2. 

M,  s  \=  -if 

<S> 

M,  s  \/=  f. 

3. 

M,s|=/iV/2 

M,  s  |=  /i  or  M,  s  1=  f2. 

4. 

M,s\=h  A/2 

M,  s  \=  fi  and  M,  s  \=  f2. 

5. 

M,  s  \=  E  g 

there  is  a  path  a  from  s  such  that  M,  o 

■  !=  9- 

6. 

M,  s  \=  A  g 

for  every  path  a  starting  from  s,  M,  a 

1  =9- 

7. 

M,  a  \=  f 

s  is  the  first  state  of  a  and  M,  s  \=  f. 

8. 

M,  a  |=  —ip, 

M,  a  p. 
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9. 

M,  a  \=  g  V  ip 

AA 

M ,  a  \=  g  or  M,  a  \=  i/j. 

10. 

M,  a  \=  n  A  ip 

M,  a  |=  /j  and  M,  a  =  ip. 

11. 

M,  a  \=  X  g 

M ,  a 1  |=  ft,. 

12. 

M,  a  \=  F  n 

AA 

there  exists  a  0  <  k  <  a-  such  that  M,  ak  \=  /<. 

13. 

M,  a  \=  G  g 

AA 

for  all  0  <  i  <  \a  ,  M,  a1  =  fi. 

14. 

M,  a  \=  fill  ip 

3  0  <  k  <  |a|  s.t.  M,  ak  =  ip  and  for  all  0  <  j  <  k,  M,  |=  //. 

15. 

M,  a  \=  g  R  ip 

4A 

VO  <  j  <  |  a  |,  if  for  every  i  <  j  M,  a 1  p  fi  then  M,  a J  |=  ip. 

We  use  T  as  an  abbreviation  for  /  V  -1/  and  F  as  an  abbreviation  for  — >T. 

3.1.2  Linear  Temporal  Logic 

We  define  Linear  Temporal  Logic  (LTL)  formulas  over  individual  propositional  sequences  (instead 
of  computation  trees).  The  set  of  well-formed  LTL  formulas  is  constructed  from  the  set  of  atomic 
propositions  AP,  the  boolean  operators  V,  and  A,  and  the  temporal  operators  X,  F,  G,  U,  and  R. 
Precisely, 

•  every  member  of  AP  is  a  formula, 

•  if  /  and  g  are  formulas,  then  -1  /,  /  V  g,  f  A  g,  X  /,  F  /,  G  /,  /  U  g,  and  /  R  g  are  formulas. 

Given  a  propositional  sequence  £  =  Up  . . .  e  AP  U  A*  and  an  LTL  formula  77,  the  notation 
£  |=  r/  means  that  //  holds  along  the  sequence  £.  Formally, 


1. 

4A 

A  G  AP  and  A  G  Iq. 

2. 

C  1=  _,F 

£I^p- 

3. 

£  N  m  v  ip 

£  N  c  or  £  1= 

4. 

£  1=  fi  A  Ip 

£  |=  g,  and  £  |=  ip. 

5. 

£NXir 

AA 

£:  1=  M- 

6. 

£NFp 

AA 

there  exists  a  0  <  k  <  |a  such  that  £A  =  //. 

7. 

£  1=  G  g 

for  all  0  <  i  <  a  ,  £*  =  g. 

8. 

£  1=  T  U  ip 

AA 

3  0  <  k  <  |a|  s.t.  pk  \=  ip  and  for  all  0  <  j  <  k,  £J  =  //. 

9. 

£N/‘R 

AA 

VO  <  j  <  |  a  |,  if  for  every  i  <  j  £*  [A  ^  then  £J  =  ip. 

3.2  Algorithm  I:  Symbolic  Model  Checking 

Model  checking  technology  [11,  19,  59]  has  been  successfully  employed  in  designing  and  debug¬ 
ging  systems.  Model  checking  is  a  technique  for  checking  whether  a  formal  model  M  of  a  system 
satisfies  a  given  property  P.  If  the  property  is  false  in  the  model,  model  checkers  typically  output  a 
counter-example,  or  a  sequence  of  transitions  ending  with  a  violation  of  the  property. 

We  briefly  describe  a  symbolic  algorithm  for  constructing  safety  scenario  graphs  (defined  in 
Section  2.1.8).  The  algorithm  is  due  to  Jha  and  Wing  [53].  The  algorithm  is  a  modification  of  an 
iterative  fixpoint  model  checking  procedure  that  returns  a  single  counterexample  [55].  The  input 
M  =  ( S ,  r,  S0,  S',  S,  D)  is  a  Biichi  model  that  accepts  finite  and  infinite  executions.  The  input  P  is 
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Input: 

M  =  (S,  t,  Sq,  S,  S,  D)  -  Biichi  model 
S  -  set  of  states 
r  C  S  x  S  -  transition  relation 
Sq  C  S  -  set  of  initial  states 

I)  :  S  — r  2ap  -  labeling  of  states  with  propositional  formulas 
P  =  AG(-i unsafe)  -  a  safety  property 

Output: 

Safety  scenario  graph  Gp  =  (Sp,rp,  Sp ,  Sp ,  D) 

Algorithm:  SymScenarioGraph(S',  r,  Sq,  D,  P ) 

-  Construct  binary  decision  diagrams  for  the  components  ofM 

(1)  (Sbdd,Tbdd,Sbdd)  <-  ConstructBDD (5,r,50) 

-  Compute  the  set  of  states  reachable  from  the  set  of  initial  states 

(2)  Sreacf j  <—  ComputeReachable(Sfo^,  rbdd,  Sbdd) 

-  Use  iterative  fixpoint  symbolic  procedure  to  find  the  set  of  states  Sp  where 

-  the  safety  property  P  fails 

(3)  S p  < —  SymModelCheck(iSreac/j,  rbdd,  Sbdd,  D,  P) 

-  Restrict  the  transition  relation  r  to  states  in  the  set  S p 

(4)  r p  < —  t  n  (Sp  x  Sp) 

(5)  s£  <-  s%dd  n  5p 

(6)  Sp  <—  {s  G  SP  |  .D(s)  — r  unsafe} 

(7)  Return(Sp,Tp,  Sp ,  Sp ,  D) 


Figure  3.1:  Symbolic  Algorithm  for  Generating  Safety  Scenario  Graphs 

a  safety  property  written  in  CTL,  a  sub-logic  of  CTL*.  CTL  safety  properties  have  the  form  AG/ 
(i.e.,  P  =  AG/,  where  /  is  a  formula  in  propositional  logic). 

The  algorithm  is  shown  in  Figure  3.1.  The  first  step  is  to  compute  symbolically  the  set  of  states 
reachable  from  the  set  of  initial  states.  The  second  step  is  to  determine  the  set  of  states  Sp  that  have 
a  path  from  an  initial  state  to  an  unsafe  state.  The  set  of  states  Sp  is  computed  using  an  iterative 
algorithm  derived  from  a  fix-point  characterization  of  the  AG  operator  [55].  Let  r  be  the  transition 
relation  of  the  model,  i.e.,  (s,  s')  €  t  if  and  only  if  there  is  a  transition  from  state  s  to  s'.  By 
restricting  the  domain  and  range  of  r  to  Sp  we  obtain  a  transition  relation  rp  that  encapsulates  the 
edges  of  the  scenario  graph.  Therefore,  the  safety  scenario  graph  is  (Sp,  rp,  Sp ,  Sj  ,  D),  where  Sp 
and  rp  represent  the  set  of  nodes  and  set  of  edges  of  the  graph,  respectively,  Sp  =  ,S'o  H  Sp  is  the 
set  of  initial  states,  and  Sp  =  {s|s  £  Sp  A  D(s)  —>  unsafe}  is  the  set  of  final  states. 

In  symbolic  model  checkers,  such  as  NuSMV  [68],  the  transition  relation  and  sets  of  states 
are  represented  using  BDDs,  a  compact  representation  for  boolean  functions.  There  are  efficient 
BDD  algorithms  for  all  operations  used  in  the  algorithm  shown  in  Figure  3.1.  These  algorithms  are 
described  elsewhere  [55,  19],  and  we  omit  them  here. 

Using  the  rs-automata  definition  of  soundness  and  exhaustiveness,  we  show  that  the  safety  sce¬ 
nario  graph  Gp  generated  by  the  algorithm  in  Figure  3.1  is  sound,  exhaustive,  and  succinct. 

Lemma  3.1  (sym-sound)  Let  the  safety  scenario  graph  Gp  =  (Sp,  rp,  Sp ,  Sp ,  D)  be  the  output 
of  the  algorithm  in  Figure  3.1  for  input  model  M  =  (S,  R,  So,  S,  S,  D)  and  safety  property  P  = 
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AG(-i  unsafe).  Then,  any  scenario  7  accepted  by  Gp  is  failing  in  M  with  respect  to  P:  L(Gp)  C 
L(M)\P. 

Proof: 

Let  7  be  a  scenario  accepted  by  Gp.  Then  7  is  finite  and  terminates  in  a  state  contained  in 
Sj  .  By  construction,  unsafe  holds  in  that  state  (step  6).  Furthermore,  Gp  contains  only  states  and 
transitions  from  M ,  so  7  is  accepted  by  M.  7  fails  with  respect  to  P  since  it  has  a  state  where 
unsafe  holds. 

n 


Lemma  3.2  (sym-exhaustive-finite)  A  finite  scenario  a  of  the  input  Biichi  model  M  =  (S,  II.  So,  S, 
S,  D )  is  failing  with  respect  to  property  P  =  AG  (^unsafe)  if  and  only  if  there  is  anf-prefix  7  of 
a  accepted  by  the  safety  scenario  graph  Gp  =  (Sp,Tp,  Sq  ,  Sj" ,  D)  output  by  the  algorithm  in 
Figure  3.1. 

Proof: 

(=>)  Let  a  =  Soto  ■  ■  ■  tj  - 1  ■‘7  be  a  finite  execution  of  M  that  fails  with  respect  to  property 
P  =  AG(-i unsafe),  a  must  contain  at  least  one  state  where  unsafe  is  true.  Let  sn  be  the  first  such 
state,  and  let  7  =  Soto  . . .  tn-\-sn.  We  show  that  Gp  accepts  7  and  that  7  is  an  f-prefix  of  a. 

To  prove  that  7  is  accepted  by  Gp,  it  is  sufficient  to  show  that  (1)  so  G  Sq,  (2)  sn  G  Sj,  and 
(3)  for  all  0  <  k  <  n,  s*.  G  S  and  tj.  G  Tp.  Since  unsafe  holds  at  sn,  by  definition  every  state 
Sk  along  7  violates  AG(-i unsafe).  Therefore,  by  construction  in  step  3,  every  Sk  in  7  is  in  Sp  and 
every  tp  is  in  Tp,  guaranteeing  (3).  (1)  and  (2)  follow  from  steps  5  and  6  of  the  algorithm. 

It  remains  to  show  that  7  is  an  f-prefix  of  a  with  respect  to  property  P  =  AG(-i unsafe).  Let  a' 
be  any  execution  of  M  that  has  the  prefix  7.  a'  contains  the  state  sn,  where  unsafe  holds.  Therefore, 
ol  is  failing  with  respect  to  P.  By  Definitions  2.9  and  2.10,  7  is  an  f-prefix  of  a. 

(<=)  Suppose  that  7  =  soto  ■  ■  ■  tn-\sn  is  a  prefix  of  a  finite  execution  a  =  Soto  ■  ■  ■  tj-iSj  of 
M,  and  that  7  is  accepted  by  Gp.  Then,  the  terminal  state  sn  is  in  Sj,  so  unsafe  is  true  at  sn.  It 
follows  that  a  violates  the  property  AG  (-^unsafe). 

□ 


Lemma  3.3  (sym-exliaustiv  e-infinite)  An  infinite  scenario  (3  of  the  input  Biichi  model  M  =  (S,  R,  So, 
S ,  S,  D )  is  failing  with  respect  to  property  P  =  AG(->  unsafe)  if  and  only  if  there  is  an  i-prefix  7 
of  (3  accepted  by  the  safety  scenario  graph  Gp  =  (Sp,Tp,  Sq,  Sj,  D)  output  by  the  algorithm  in 
Figure  3.1. 


Proof: 

The  proof  is  analogous  to  the  proof  of  Lemma  3.2. 


□ 


Lemma  3.4  (sym-succinct-state)  Let  the  safety  scenario  graph  Gp  =  (Sp,  Tp,  Sq,  Sj  ,  D)  be  the 
output  of  the  algorithm  in  Figure  3.1  for  input  model  M  =  (S,  R,  So,  S,  S,  D )  and  safety  property 
P  =  AG(-i unsafe).  Then,  for  every  state  s  G  Sp  there  exists  a  scenario  a  in  Gp  that  contains  s. 
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Proof: 

By  construction  in  steps  2  and  3  of  the  algorithm,  all  states  generated  for  the  safety  scenario 
graph  Gp  arc  reachable  from  an  initial  state,  and  all  of  them  violate  AG  (-^unsafe).  Therefore,  for 
any  state  s  €  Sp,  there  is  a  path  an  from  an  initial  state  to  s,  and  there  is  a  path  0:2  from  s  to  an 
unsafe  state. 

Let  a  =  cti^.  a  violates  AG(-> unsafe),  so  it  is  accepted  by  Gp.  In  addition,  a  contains  s,  as 
desired. 

□ 

Lemma  3.5  (sym-succinct-transition)  Let  the  safety  scenario  graph  Gp  =  (Sp,  Tp,  Sq,  Sj ,  D) 
be  the  output  of  the  algorithm  in  Figure  3.1  for  input  model  M  =  (S,  R,  Sq,  S,  S,  D)  and  safety 
property  P  =  AG  (-‘unsafe).  Then,  for  every  transition  (s\,t,  S2)  £  Tp  there  exists  a  scenario  a  in 
Gp  that  contains  t. 

Proof: 

By  Lemma  3.4,  there  is  a  scenario  a\  =  qoto  ■  ■  ■  s  1 . . .  tm- 1  qrn  in  Gp  that  contains  state  si 
and  another  scenario  a  2  =  rtf  a  . . .  $2  ■  ■  ■  tn- 1  rn  in  Gp  that  contains  state  S2-  The  scenario  a  = 
qoto  . . .  s\ts2  ■  ■  ■  tn—\rn  includes  both  states  si  and  S2  and  the  transition  t  and  is  accepted  by  Gp. 

□ 

Theorem  3.1  Algorithm  in  Figure  3.1  generates  sound,  exhaustive,  and  succinct  safety  scenario 
graphs. 

Proof: 

Follows  directly  from  Lemmas  3.2,  3.3,  3.1,  3.4,  and  3.5  and  Definitions  2.8,  2.18,  and  2.19. 

n 


3.3  Explicit-State  Model  Checking  Algorithms 

The  symbolic  scenario  graph  algorithm  can  sometimes  take  advantage  of  efficient  state  space  rep¬ 
resentation  inherent  in  the  symbolic  approach.  However,  it  is  not  efficient  for  the  category  of  appli¬ 
cations  that  need  many  variables  to  represent  the  complexities  of  the  problem,  yet  have  relatively 
small  reachable  state  spaces.  The  attack  graph  application  (discussed  in  Paid  II)  falls  into  this  cat¬ 
egory.  In  our  experience  the  symbolic  model  checker  could  spend  hours  constructing  the  binary 
decision  diagrams  for  the  transition  relation  in  step  1  of  the  symbolic  algorithm,  only  to  discover  in 
step  2  that  the  reachable  state  space  is  very  small.  For  such  problems  explicit-state  model  checking 
is  more  appropriate. 

Given  a  model  represented  by  a  Biichi  automaton  M  and  a  correctness  property  P  specified  in 
Lineal-  Temporal  Logic,  our  objective  is  to  build  a  scenario  graph  automaton  Ms  that  recognizes 
precisely  the  set  of  executions  of  M  that  violate  the  property  P.  For  example,  given  the  model 
shown  in  Figure  3.2(a)  and  correctness  property  “P  =  eventually  c  is  true ”,  the  scenario  automaton 
Ms  is  shown  in  Figure  3.2(b). 

In  outline,  our  algorithm  is  similar  to  the  classic  LTL  model  checking  algorithm  used  in  model 
checkers  such  as  SPIN.  Correctness  property  P  induces  a  language  C(P)  of  executions  that  are 
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permitted  under  the  property.  The  executions  of  the  model  M  that  violate  P  thus  constitute  the 
language  C{M)\C{P).  The  scenario  graph  automaton  accepting  this  language  is  Ms  =  MC\—*P.  In 
the  following  sections  we  describe  each  part  of  building  Ms  in  detail  and  consider  several  variations. 

3.3.1  Strongly  Connected  Component  Graphs 

The  explicit  state  algorithms  described  later  in  this  Chapter  will  refer  to  strongly  connected  compo¬ 
nents  of  a  Biichi  automaton.  A  strongly  connected  component  of  a  Biichi  model  M  =  (S,  r,  So,  Sa, 
Sf,D )  is  a  subset  of  states  CCS  such  that  for  any  pair  of  states  si  G  C,  S2  G  C  there  is  a  path 
from  si  to  S2  via  the  transition  relation  r.  Thus,  we  can  partition  the  set  of  states  S  into  strongly 
connected  components  and  construct  a  strongly  connected  component  graph  out  of  them. 

Definition  3.1  Let  M  =  (S,  r,  So,  S„ .  Sf,  D)  be  a  Biichi  model.  The  strongly  connected  compo¬ 
nent  graph  (SCC  graph)  of  M  is  a  triple  SCCm  =  (C,  Er,Co),  where 

•  C  is  the  set  of  strongly  connected  components  of  M 

•  Et  C  C  x  C  is  a  transition  relation  on  C  such  that  (ci,  C2)  G  ET  iff  c\  f  C2  and  there  exists  a 
transition  (si,  S2)  G  r  such  that  s  1  G  ci  and  s 2  G  C2 

•  Co  is  the  set  of  initial  components  such  that  Co  G  Co  iff  Co  Cl  So  f  0. 

An  SCC  graph  cannot  contain  cycles,  since  trivial  cycles  are  ruled  out  by  the  definition  of  ET,  and 
any  nontrivial  cycle  involving  more  than  one  component  would  be  collapsed  into  a  single  compo¬ 
nent.  Thus,  an  SCC  graph  is  always  a  directed  acyclic  graph.  We  can  construct  SCC  graphs  easily 
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Input: 

SCCM  =  (C,  Et,  Co),  an  SCC  graph 
Q  :  C  — >  bool ,  a  predicate  on  the  set  C 

Output: 

C\  a  subset  of  C 

Algorithm  RSET-UNIVERSAL(C,  Et,Co,  Q ) 

C  <—  0; 

V'  <-  0; 

forall  co  £  Co  \  V'  do 

(C,  V')  <-  Visitu(c0,c,  Et,Q,C',  V'y 

end; 

return  C'; 

end  RSET -UNIVERSAL 

function  visitu(c,  C,  ET,  Q,C' ,  V') 

V'  <-  F'  U  {c>; 

forall  c'  €  C  \  V'  s.t.  (c,  c')  6  do 

(c',  v')  <-  visit„(c',c,  et,q,c\  v'y 

end; 

if  Q(c)  A  Vc'  £  C  |  (c,  c')  £  ET  .  c'  £  C1  then 
C'  <-  C'  U  {c}; 

end; 

return  (C' ,  V'); 
end  Visits 


(a)  RSET -UNIVERSAL 


Input: 

SCCM  =  (C,  Et,  Co),  an  SCC  graph 
Q  :  C  — >  bool,  a  predicate  on  the  set  C 

Output: 

C ,  a  subset  of  C 

Algorithm  rset— ex  I  stent  I  al(C,  Et,C0,  Q) 
C  <—  0; 

V'  <-  0; 

forall  co  €  Co  \  V'  do 

(cy  V)  *-  visite(c0,c, et, q,c, v'y 

end; 

return  C'; 

end  RSET -EXISTENTIAL 

function  Visite(c,  C,  BT,  Q,C',  V') 

1/'  «-  V'  U  {c}; 

forall  c'  £  C\V'  s.t.  (c,  c')  6  Sr  do 

(C',  V')  Visite(c',C,£r,  Q,C',  1/'); 

end; 

if  Q(c)  V  3c'  £  C  |  (c,  c')  £  Et  .  c'  £  C'  then 
C  *-  C  U  {c}; 

end; 

return  (C',  V'y 
end  Visite 


(b) RSET-EXISTENTIAL 


Figure  3.3:  RSet  Predicate  Algorithms 


using  Tarjan’s  classic  procedure  for  finding  strongly  connected  components  [90].  Tarjan’s  SCC 
procedure  is  linear  in  the  size  of  r. 

For  each  SCC  graph  SCCm  =  (C.  ET.  Co)  we  define  a  function  RSet:  C  —>  V(C).  For  each 
component  c  €  C,  another  component  d  is  in  RSet(c)  iff  either  c'  =  c  or  d  is  reachable  from  c 
via  the  transition  relation  Er.  That  is,  RSet(c)  is  the  set  of  components  reachable  from  c  via  the 
transition  relation  ET. 

In  the  algorithms  described  later,  the  following  two  sub-problems  come  up  several  times.  Given 
a  Biichi  model  M  and  its  SCC  graph  SCCm  =  (C,  Et,Cq),  for  each  c  6  C  determine  whether  a 
given  boolean  predicate  Q  :  C  — >  bool  is  true 

1.  for  every  component  in  RSet(c),  or 

2.  for  some  component  in  RSet(c). 

To  make  it  more  convenient  to  identify  the  components  described  by  sub-problems  (1)  and  (2),  we 
define  two  more  predicates  on  components. 

Definition  3.2  Given  an  SCC  graph  SCCm  =  (C,  Et,Cq)  and  a  predicate  Q  :  C  — >  bool,  define 
predicates  Q  J  :  C  — - r  bool  and  Q  A  :  C  — >  bool  as  follows: 

Qv(c)  is  true  iffQ  is  true  at  every  component  in  RSet(c) 

Q3(c)  is  true  iffQ  is  true  at  some  component  in  RSet(c). 
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So  sub-problem  (1)  asks  to  find  all  components  in  an  SCC  graph  where  Q  holds.  Similarly,  sub¬ 
problem  (2)  asks  to  find  all  components  in  an  SCC  graph  where  Q  holds.  Figure  3.3  shows  two 
algorithms  that  compute  answers  to  subproblems  (1)  and  (2). 

The  procedure  RSET -UNIVERSAL  in  Figure  3.3(a)  takes  as  input  an  SCC  graph  SCCm  = 
(C,Et,Cq)  and  a  predicate  Q  and  finds  every  component  c  €  C  such  ()  J  is  true  at  c.  RSET -UNIVER¬ 
SAL  traverses  the  SCC  graph  in  depth-first  order,  invoking  the  recursive  function  Vi s it „ on  every 
previously  unexplored  component  that  contains  an  initial  state.  Each  invocation  of  Vi  s  it  u  receives 
two  arguments:  the  set  V  of  components  that  have  been  visited  previously,  and  a  subset  C  C  V  of 
previously  visited  components.  For  each  component  c'  in  this  subset  C .  Q  is  guaranteed  to  be  true 
everywhere  in  RSet(c').  That  is,  Q  is  true  for  every  member  of  C . 

The  task  of  Visit,,  is  to  determine  whether  Q  J  is  true  for  its  argument  component  c.  Visit,,, 
marks  c  as  visited  and  then  invokes  itself  recursively  on  each  successor  of  c.  Each  recursive  call  on 
a  successor  of  c  determines  whether  or  not  Q  J  is  true  at  that  successor;  each  successor  where  ()  ' 
is  true  will  be  added  to  C' .  Once  all  successors  have  been  visited,  Visits  checks  that  Q  is  true 
at  c  and  that  every  successor  of  c  belongs  to  C .  These  conditions  together  guarantee  that  Q  is  true 
at  every  member  of  RSet(c),  so  Visits  adds  c  to  C .  When  every  component  in  the  SCC  graph 
SCCm  has  been  visited  by  Visits,  the  set  C  contains  precisely  those  components  that  satisfy  the 
condition  in  subproblem  (1). 

Lemma  3.6  (visit-u-invariant)  Suppose  function  Visitu  in  Figure  3.3(a)  is  invoked  with  the  ar¬ 
guments  (c,  C,  Er,  Q ,  C\  V'),  where 
C  is  a  set  of  components 
Er  :  C  x  C  is  a  transition  relation 
Q  :  C  — >  bool  is  a  predicate 
V  C  C  is  the  set  of  previously  visited  components 
C  C  V'  is  the  subset  of  previously  visited  components  where  Q1  holds, 
c  is  a  member  of  C\V' 

Then  the  first  set  of  components  output  by  function  Visitu  contains  the  entire  set  C  plus  every 
component  c'  £  RSet(c)  \  V'  such  that  QV(c/)  holds.  The  second  set  of  components  output  by 
Visitu  is  equal  to  V'G  RSet(c). 

Proof: 

The  proof  proceeds  by  induction  on  the  diameter  of  the  subgraph  rooted  at  c.  When  the  diameter 
is  1,  c  has  no  successors  and  RSet(c)  =  {c},  so  Qv(c)  holds  if  and  only  if  Q(c )  holds.  In  this  case 
Visits  will  not  invoke  itself  recursively.  Visits  adds  c  to  V  and  also  adds  c  to  C  whenever 
Q(c)  holds,  as  desired. 

For  the  inductive  case  (diameter  =  n),  Visits  invokes  itself  on  each  previously  unexplored 
successor  of  c.  The  subgraph  rooted  at  each  successor  of  c  has  a  diameter  smaller  than  n,  so  we  can 
apply  the  inductive  hypothesis  to  each  recursive  invocation  of  Visitn  and  conclude  that  at  the  end 
of  the  forall  loop  every  component  reachable  from  c  has  been  added  to  V  and  every  component 
reachable  from  c  for  which  O  '  holds  has  been  added  to  C . 

It  remains  to  check  that  c  itself  is  added  to  C'  if  and  only  if  Q  ‘  [c)  holds.  The  if  statement 
following  the  forall  loop  adds  c  to  C  when  Q  holds  at  c  and  every  successor  of  c  is  in  C .  The  fact 
that  every  successor  of  c  is  in  C  means  that  Q  '  holds  at  every  successor  of  c,  which  in  turn  means 
that  Q  holds  at  every  state  reachable  from  c.  So  Qv  holds  at  c,  as  desired. 

n 
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Theorem  3.2  (rset-universal-correct)  Let  C  be  the  output  of  algorithm  RSET-UNIVERSAL  on  the 
input  SCC  graph  SCCm  =  (C,  Er,Cf)  and  predicate  Q.  Then  C  is  precisely  the  set  of  components 
that  are  reachable  from  some  component  in  Co  and  where  Qv  holds. 

Proof: 

RSET-UNIVERSAL  invokes  Visit,,,  on  every  component  of  SCCm  that  contains  an  initial 
state.  The  Theorem  follows  directly  by  applying  invariant  Lemma  3.6  to  each  of  these  invocations 

of  Visits. 

□ 

The  procedure  RSET -EXISTENTIAL  in  Figure  3.3(b)  is  almost  identical  to  RSET-UNIVERSAL. 
It  takes  as  input  an  SCC  graph  SCCm  =  (C,Et,Cq)  and  a  predicate  Q  and  finds  every  com¬ 
ponent  c  £  C  such  Q3  is  true  at  c.  The  recursive  function  Visite  plays  the  same  role  for 
RSET-EXISTENTIAL  as  the  function  Visitn  plays  for  RSET-UNIVERSAL.  The  only  differ¬ 
ence  between  RSET-EXISTENTIAL  and  RSET-UNIVERSAL  is  in  the  condition  of  the  if  state¬ 
ment  inside  the  function  V±s±te.  To  ensure  that  component  c  is  added  to  C  if  and  only  if  Q3  is 
true  at  c,  the  if  statement  checks  that  Q  is  true  at  c  or  at  one  of  c’s  successors. 

Lemma  3.7  and  Theorem  3.3  guarantee  correctness  of  the  algorithm  RSET-EXISTENTIAL. 
We  omit  the  proofs,  which  arc  analogous  to  the  proofs  of  Lemma  3.6  and  Theorem  3.2. 

Lemma  3.7  (visit-e-invariant)  Suppose  function  Visite  in  Figure  3.3(b)  is  invoked  with  the  ar¬ 
guments  (c,  C,  Er,  Q ,  C',  V'),  where 

C  is  a  set  of  components 

Et  :  C  x  C  is  a  transition  relation 

Q  :  C  — >  bool  is  a  predicate 

V  C  C  is  the  set  of  previously  visited  components 

C  C  V'  is  the  subset  of  previously  visited  components  where  Q3  holds, 

c  is  a  member  of  C\V' 

Then  the  first  set  of  components  output  by  function  Visitu  contains  the  entire  set  C  plus  every 
component  c'  G  RSet(c)  \  V'  such  that  Q3(d)  holds.  The  second  set  of  components  output  by 
Visitu  is  equal  to  V'G  RSet(c). 

Theorem  3.3  ( rset-existential-correct )  Let  C  be  the  output  of  algorithm  RSET-EXISTENTIAL 
on  the  input  SCC  graph  SCCm  =  ( C,Et,Cq )  and  predicate  Q.  Then  C'  is  precisely  the  set  of 
components  that  are  reachable  from  some  component  in  C  o  and  where  Q3  holds. 


3.3.2  Biichi  Automata  Intersection 

The  explicit-state  model  checking  algorithm  is  based  on  constructing  the  intersection  of  the  model 
automaton  M  and  the  automaton  N p  representing  the  negation  of  the  correctness  property.  Let 

M  =  (S,rm,S^,Sa,Sf,D) 


and 
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Input: 

M  =  (S,t,  So,  S,  S,  D)  -  the  model  (Biichi  automaton) 

P  -  an  LTL  correctness  property 

Output: 

Scenario  automaton  Ms,  where  C(MS)  =  C(M)\C(p) 

Algorithm:  GenerateScenarioAutomaton(M,  P) 

-  Convert  —> p  to  a  Biichi  automaton  Np 

(1)  Np  <—  ConvertToBuchi(-iP); 

-  Construct  an  intersection  of  M  and  Np 

(2)  /  =  (S\  T\  SI  Si  S),  Dl)  <—  M  fi  Np- 

-  Compute  the  SCC  graph  of  I 

(3)  SCCi  =  (C,Et,C0)  <-  Tar  janSCC(I); 

-  Define  a  predicate  on  C  identifying  SCCs  with  a  final  state 

-  or  an  acceptance  cycle 

(4)  Qs  <—  A  c  :  C.(3s  £  c.(s  £  Sj  V  (c  has  an  acceptance  cycle)))-, 

-  Compute  the  set  of  components  where  Q3S  is  true 

(5)  Cs  <-  rset-existential(C,jEt,C0,Qs); 

-  Cs  comprises  the  set  of  states  of  the  result  scenario  automaton 

(6) Ss^UceCs  c- 

-  Restrict  the  transition  relation  to  the  states  in  Ss 

(7)  ts  <-  t1  n  (Ss  x  ss). 

-  Ms  consists  of  SCCs  of  I  that  can  reach  a  component  with  an  acceptance  state 

-  or  a  final  state 

(8)  ms  <-  (Ss,  Ts,  sl  n  Ss,  si  n  s8,  s }  n  s8,  d% 

(9)  Return  Ms. 


Figure  3.4:  Explicit-State  Algorithm  for  Generating  Scenario  Automata 


Np  =  (Sp,  tp,  S®,  Sp,  S£,  0) 

be  the  Biichi  automata  representing  the  model  and  the  negation  of  the  correctness  property,  respec¬ 
tively.  When  checking  correctness,  we  want  to  include  every  possible  finite  and  infinite  execution 
of  M,  so  every  state  of  M  is  both  accepting  and  final: 

Sa  =  S 
Sf  =  S 

We  use  a  simplified  definition  of  Biichi  automata  intersection  that  is  valid  whenever  all  states  of  one 
automaton  are  all  accepting  and  final.  The  intersection  of  M  and  Np  is  defined  as  follows: 

Ms  =  M  nNP  =  {Sx  Sp,  t,  Si  x  Sj,  S  x  S“,  5  x  5/,  D) 

Here  the  transition  relation  r  is  such  that  ((r i,  qi),t,  (r2,  <72))  €  r  if  and  only  if  (r\,t,  r 2)  £  rm 

and  (q\,t,  (72)  £  tp.  A  state  (r,  q)  is  accepting  in  M  n  Np  if  and  only  if  state  q  is  accepting  in  Np. 

Likewise,  a  state  (r,  q)  is  final  in  M  ft  Np  if  and  only  if  state  q  is  final  in  Np. 
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3.3.3  Computing  Full  Scenario  Automata 

An  algorithm  for  generating  scenario  automaton  Ms  from  a  Biichi  model  M  and  an  LTL  correctness 
property  P  is  shown  in  Figure  3.4  and  illustrated  in  Figure  3.5.  The  algorithm  uses  Gerth  et.al.' s 
procedure  [39]  to  convert  the  LTL  correctness  formula  P  to  a  Biichi  automaton  Np  (line  1)  and 
computes  the  intersection  I  =  M  n  Np  (line  2). 

The  intersection  automaton  I  accepts  the  set  of  failing  scenarios  of  the  model  M.  However,  it 
also  represents  the  entire  state  space  of  the  model.  The  next  several  steps  of  the  algorithm  separate 
the  states  of  I  that  do  no  participate  in  any  failing  scenarios.  Step  3  takes  the  intersection  automaton 
/  (Figure  3.5(a))  and  computes  its  SCC  graph  (Figure  3.5(b)),  using  a  version  of  the  classic  SCC 
algorithm  due  to  R.  Tarjan  [90]. 

Let  any  cycle  containing  at  least  one  acceptance  state  be  called  an  acceptance  cycle.  Steps  4 
and  5  use  the  RSET -EXISTENTIAL  algorithm  from  Section  3.3.1  to  compute  the  set  of  strongly 
connected  components  Cs  such  that  from  each  state  s  <E  UceCs  c  *s  possible  to  reach  an  SCC 
with  either  a  final  state  or  an  acceptance  cycle.  To  find  these  strongly  connected  components, 
RSET-EXISTENTIAL  is  invoked  with  the  following  predicate  on  the  components  of  SCC  j : 

Qs  =  A  c  :  C  .  (3s  €  c .  (s  is  final  in  /))  V  (c  has  an  acceptance  cycle ) 

In  Figure  3.5(c)  the  set  Cs  of  SCCs  found  in  step  5  is  shaded.  Components  2  and  3  contain  an 
acceptance  cycle.  Component  5  has  final  states.  Components  1  and  4  are  included  in  the  set  because 
from  any  state  in  those  components  it  is  possible  to  reach  component  5. 

The  SCCs  found  by  RSET-EXISTENTIAL  comprise  the  set  of  states  of  the  result  scenario 
automaton  (Figure  3.5(d)).  To  ensure  succinctness,  we  discard  the  components  of  I  that  do  not 
belong  to  the  set  Cs  (components  6,  7,  and  8  in  Figure  3.5(c)). 

The  following  Lemmas  show  that  the  algorithm  in  Figure  3.4  produces  sound,  exhaustive,  and 
succinct  scenario  automata. 

Lemma  3.8  ( explicit-sound )  If  a  is  an  execution  in  the  the  output  automaton  Ms  =  (Ss,ts,  Sq, 
Sj,  Ls),  then  a  violates  the  correctness  property  P  in  the  input  automaton  M  =  ( S ,  r,  S o,  S',  S,  D): 
L(MS )  C  L(M)\P. 

Proof: 

Let  a  =  Soto  •  •  •  tn-\sn  ...  be  an  execution  of  the  output  automaton  Ms.  Then  a  is  also  an 
execution  of  I  =  M  fl  Np.  Thus,  a  is  an  execution  of  M  and  violates  the  property  P. 

□ 

Lemma  3.9  (explicit- exhaustive)  If  an  execution  a  of  the  input  automaton  M  =  (S,  r,  So,  S,  S,  D) 
violates  the  correctness  property  P,  then  a  is  accepted  by  the  output  automaton  Ms  =  (Ss,  rs,  S(], 
Sj,Ls):  L(MS)  D  L(M)\P. 

Proof: 

Let  a  =  Soto  •  •  -tn_isn ...  be  an  execution  of  the  input  model  M  such  that  a  violates  the 
correctness  property  P.  Since  a  violates  P,  every  state  and  transition  in  a  will  be  present  in  the 
intersection  automaton  I  =  M  T l  Np  ([19],  pp.  123-125).  To  prove  that  a  is  an  execution  in  Ms, 
we  first  show  that  every  state  and  every  transition  in  a  is  present  in  Ms.  There  arc  two  cases. 


Mn-iP 


(a)  (b) 


(c) 


o 

regular  state 

s.  / 

initial  state 

o 

acceptance  state 

(d) 


final  state 

acceptance  +  final  state 


Figure  3.5:  Explicit  Algorithm  Example 
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Case  1  Suppose  that  a  is  infinite.  Then  there  must  be  an  acceptance  cycle  of  I  that  a  traverses 
infinitely  often.  Let  sa  be  the  an  acceptance  state  in  this  cycle.  All  states  and  transitions  following 
sa  in  a  must  be  in  the  strongly  connected  component  ca  of  I  that  contains  sa.  Since  the  component 
ca  contains  an  acceptance  cycle,  step  5  adds  ca  to  Cs,  and  subsequently  steps  6  and  7  add  the  states 
and  transitions  of  ca  to  Ms. 

All  states  and  transitions  preceding  sa  in  a  belong  to  strongly  connected  components  from 
which  it  is  possible  to  reach  sa.  By  Theorem  3.3  (rset-existential-correct),  in  step  5  all  of  those 
components  will  be  added  to  the  set  Cs.  So  steps  6  and  7  also  add  the  states  and  transitions  belonging 
to  those  components  to  Ms. 

Case  2  Suppose  that  a  is  finite.  Let  Sf  be  the  first  final  state  of  /  reached  by  a.  All  states  and 
transitions  following  Sf  in  a  must  be  in  the  strongly  connected  component  c/  of  I  that  contains  Sf. 
Step  5  adds  cj  to  Cs,  and  subsequently  steps  6  and  7  add  the  states  and  transitions  of  cj  to  Ms. 

All  states  and  transitions  preceding  Sf  in  a  belong  to  strongly  connected  components  from 
which  it  is  possible  to  reach  Sf.  By  Theorem  3.3  (rset-existential-correct),  in  step  5  all  of  those 
components  will  be  added  to  the  set  Cs.  So  steps  6  and  7  also  add  the  states  and  transitions  belonging 
to  those  components  to  Ms. 

It  remains  to  show  that  a  is  accepted  by  Ms.  a  violates  the  correctness  property  P,  so  it  is 
accepted  by  the  negated  property  automaton  Np,  and  therefore  also  by  the  intersection  I  =  M nNp. 
All  accepting  states  of  I  that  also  exist  in  Ms  are  accepting  in  Ms.  Similarly,  all  final  states  of  I 
that  also  exist  in  Ms  arc  final  in  Ms.  Since  a  is  accepted  by  I,  it  must  also  be  accepted  by  Ms. 

a 

Lemma  3.10  (explicit-succinct  state)  A  state  s  of  the  input  automaton  M  =  (S,  r,  So .  S',  S,  D )  is  a 
state  in  the  output  automaton  Ms  =  ( Ss,ts ,  Sq,  S^,  Sj,  Ds)  if  and  only  if  there  is  an  execution  a  of 
the  input  automaton  M  =  (S,  r,  So,  S,  S,  D )  that  violates  the  correctness  property  P  and  contains 
s. 


Proof: 

(=^>)  Let  s  be  a  state  in  the  output  automaton  Ms  =  (Ss,ts,  Sq,  Sj,  Ls).  By  construction  in 
steps  4-7,  s  is  either  a  member  of  a  reachable  SCC  that  contains  a  final  state  or  an  acceptance  cycle, 
or  it  is  a  member  of  a  reachable  SCC  from  which  it  is  possible  to  reach  another  SCC  that  contains 
an  acceptance  cycle  or  a  final  state.  In  either  case,  there  exists  a  path  a  in  Ms  that  starts  in  an 
initial  state,  goes  through  s,  and  then  either  ends  in  a  final  state  or  loops  around  an  SCC  visiting  an 
acceptance  state  infinitely  often.  By  definition,  the  path  a  is  an  execution  of  Ms.  By  lemma  3.8,  a 
is  an  execution  of  the  input  automaton  M  and  it  violates  correctness  property  P. 

(<t=)  If  there  is  an  execution  in  Ms  that  contains  s,  then  trivially  s  is  in  Ms.  □ 

Lemma  3.11  (explicit-succinct  transition)  A  transition  t  =  (si,S2)  £  t  of  the  input  automaton 
M  =  ( S ,  r,  S0,  S,  S ,  D)  is  a  transition  in  the  output  automaton  Ms  =  ( Ss,ts ,  Sq,  S Sj,  Ds )  if 
and  only  if  there  is  an  execution  a  of  the  input  automaton  M  =  (S,  r,  So,  S,  S,  D)  that  violates  the 
correctness  property  P  and  contains  t. 

Proof: 

(=>)  Let  t  be  a  transition  in  the  output  automaton  Ms  =  (Ss,ts,  Sq,  S'®,  Sj ,  Ls).  The  proof  that 
t  must  lie  on  an  execution  of  Ms  that  is  also  an  execution  of  M  and  violates  P  is  analogous  to  the 
proof  of  Lemma  3.10,  (=f-). 

(<=)  If  there  is  an  execution  in  Ms  that  contains  t,  then  trivially  t  is  in  Ms.  n 
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Theorem  3.4  The  algorithm  in  Figure  3.4  generates  sound ,  exhaustive,  and  succinct  scenario  au¬ 
tomata. 

Proof: 

Follows  directly  from  Lemmas  3.9,  3.8,  3.10,  and  3.1 1  and  Definitions  2.6,  2.7 ,  and  2.8. 

□ 


3.3.4  Computing  RS-Automata  for  Finite  Scenarios 

Now  we  turn  our  attention  to  the  task  of  producing  rs-automata  that  arc  smaller  in  size  than  the  full 
scenario  automata  computed  by  the  algorithm  GenerateScenarioAutomaton.  For  simplicity, 
we  will  treat  finite  and  infinite  executions  separately. 

Given  a  Biichi  model  M  and  an  LTL  correctness  property  P,  our  basic  approach  still  relies  on 
extracting  a  suitable  sub-graph  of  the  intersection  automaton  I  =  M  n  -<P.  To  begin,  we  prove 
some  facts  about  f-representative  scenarios  with  respect  to  I. 

Lemma  3.12  Given  a  Biichi  automaton  M  and  a  correctness  property  P,  a  finite  execution  7  = 
Soto  . . .  tn—\  sn  in  M  is  f-representative  with  respect  to  P  if  and  only  if  every  state  reachable  from 
sn  in  the  automaton  I  =  M  C I  —>P  is  final. 

Proof: 

(=$■)  Let  7  =  sot-o  ■  ■  ■  tn- 1  sn  be  an  f-representative  execution  in  M  with  respect  to  P.  Let 
s  be  any  state  reachable  from  sn  in  I  via  some  path  ft.  The  finite  execution  a  =  y ft  has  an  f- 
representative  prefix  7,  so  by  Definition  2.9  the  execution  a  is  failing  with  respect  to  property  P. 
Automaton  I  accepts  all  failing  executions  of  M,  so  it  must  accept  the  execution  a.  Therefore,  the 
terminal  state  s  of  a  is  final  in  I. 

(<=)  Let  7  =  so^o  •  •  •  tn—\sn  be  an  finite  execution  in  M  with  respect  to  P.  Assume  that 
every  state  reachable  from  7’s  terminal  state  sn  in  the  automaton  I  is  final.  To  prove  that  7  is  f- 
representative  with  respect  to  P,  we  have  to  show  that  any  finite  scenario  of  M  with  prefix  7  is 
failing  with  respect  to  P.  Let  a  =  yft  be  a  finite  scenario  of  M.  By  assumption  the  terminal  state 
of  a  is  final  in  I,  so  a  is  accepted  by  I.  I  accepts  only  failing  scenarios  of  M  with  respect  to  P, 
therefore  a  fails  with  respect  to  P. 

□ 


Given  a  Biichi  automaton  M  and  a  correctness  property  P,  let  I  =  M  n  ~*P  and  let  SCCj  = 
(C.  Et,Cq)  be  the  SCC  graph  of  automaton  I.  Define  the  following  predicate  on  the  components  of 
SCCr: 


Qf  =  X  c  :  C  .  (Vs  G  c  .  s  is  final  in  I) 

Then  we  have  the  following  Corollary  to  Lemma  3. 12: 

Corollary  3.1  A  finite  execution  7  =  soto  ■  ■  ■  tn—  1  sn  in  M  is  f-representative  with  respect  to  P  if 
and  only  if  in  the  SCC  graph  SCCj,  Q ^  is  true  for  the  component  containing  state  sn. 
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Input: 

M  =  (S.  t.  So,  S,  S.  D)  -  the  model  (Biichi  automaton) 

P  -  an  LTL  correctness  property 

Output: 

Rs-automaton  Mrp  =  (Mr ,  Srp  0) 

Algorithm:  F-RSAutomaton (M,P) 

-  Convert  —>p  to  a  Biichi  automaton  Np 

(1)  Np  <—  ConvertToBiichi(^P); 

-  Construct  an  intersection  of  M  and  Np 

(2)  I  =  0 S\t\  S^Sl  Si  S),  Di  <—  M  fl  Np- 

-  Compute  the  SCC  graph  of  I 

(3)  SCCi  =  (C,Et,Co)  «-  Tar  janSCC(I); 

-  Define  a  predicate  on  C  identifying  SCCs  with  a  final  state 

(4)  Qs  <—  A  c  :  C.(3s  G  c.(s  G  S})); 

-  Define  a  predicate  on  C  identifying  SCCs  where  every  state  is  final 

(5)  Qf  A  c  :  C.(Vs  G  c.(s  G  <5})); 

-  Compute  the  set  of  components  where  Qj  is  true 

(6)  Cs  <-  RSET-EXI  STENTIAL(C,  Et,Cq,  Qs)', 

-  Compute  the  set  of  components  where  (fj  is  true 

(7)  Cf  <—  rset-universal(C,  Et,Cq,  Qf)-, 

-  Compute  the  set  of  components  where  Q3S  is  true  and  Qj  is  false 

(8)  C,  ^Cs\Cf- 

-  Compute  the  set  of  components  where  both  (fj  and  (fj  are  true 

(9)  C2  <—  cs  n  Cf-, 

-  Gather  all  the  states  in  C\ 

(10)  5r  -  UceCl  c; 

-  Isolate  the  initial  states  0/C2 

(11)  S2  «-  Sj  n  (Ucec2  c); 

-  Isolate  the  border  states  0/C2 

(12)  53  <-  {s  G  (UceC2  c)  |  3s'  G  Si  .  (s',  s)  G  ff 

-  The  rs-automaton  includes  all  states  in  C\,  initial  states  of  C2,  and  border  states  0/C2 

(13)  Ss  4-  Si  u  s2  u  S3. 

-  Restrict  the  transition  relation  to  the  states  in  Ss 

(14)  Ts  <-  {(si,S2)  G  Tl  |  (si  G  5S  A  s2  G  Ss)}. 

(15)  Mr  <-  (ss,ts,  si0  n  Ss,  0,  S)  n  ss,  d1). 

-  The  set  Srf  of  f-scenario  end-points  consists  of  initial  states  0/C2  and  border  states  0/C2 

(16)  Return  MrP  =  (Mr,  S2  U  S3, 0). 


Figure  3.6:  Algorithm  for  Generating  F-Exhaustive  RS-Automata 
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Corollary  3.1  suggests  a  way  to  modify  the  algorithm  GenerateScenarioAutomaton  to  com¬ 
pute  an  f-exhaustive  rs-automaton  Mrp  =  (Mr.  Srj.  Srj)  instead  of  the  full  scenario  automaton  Ms. 
The  modified  algorithm  F-RSAutomaton  is  shown  in  Figure  3.6  and  illustrated  in  Figure  3.7. 
The  result  rs-automaton  accepts  only  f-scenarios  and  finite  r-scenarios;  we  treat  infinite  scenarios 
separately  in  Section  3.3.5. 

The  algorithm  begins  by  computing  the  intersection  automaton  I  =  Mf)-<P  (Figure  3.7(a))  and 
its  strongly  connected  component  graph  SCCj  (Figure  3.7(b)).  Lines  4  and  5  define  two  predicates 
on  the  components  in  SCCj,  Qs  and  Qj.  The  predicate  Qs  identifies  SCCs  with  at  least  one  final 
state.  The  predicate  Q f  identifies  SCCs  where  every  state  is  final. 

Lines  6  and  7  compute  two  sets  of  components  using  the  RSET  algorithms.  Set  Cs  consists 
of  components  where  Qj  holds,  and  set  Cf  consists  of  components  where  Qj  holds.  Next,  the 
algorithm  splits  the  SCCs  of  /  into  three  classes: 

1.  Components  where  Qj]  is  true  and  Qj  is  false  (set  C\). 

2.  Components  where  both  Qj  and  Qj  are  true  (set  C2). 

3.  All  other  components. 

The  three  component  classes  are  shown  in  Figure  3.7(c)  with  distinct  shades.  Components  in  class 
1  consist  of  states  from  which  it  is  possible  to  reach  a  final  state.  Furthermore,  since  Qj  is  false 
for  class  1  components,  by  Corollary  3.1  f-representative  scenarios  cannot  terminate  inside  class 
1  components.  It  is  therefore  necessary  to  include  class  1  components  in  the  result  rs-automaton 
in  their  entirety  (state  set  S±,  lines  10  and  13).  The  automaton  in  Figure  3.7(c)  has  four  class  1 
components,  numbered  1  through  4.  Every  state  and  transition  in  those  components  is  included  in 
the  result  automaton  in  Figure  3.7(d). 

Components  in  class  2  can  contain  initial  states.  One  such  state  is  shown  in  Figure  3.7(c)  in 
component  5.  In  all  component  2  initial  states  Qj  is  true.  By  Corollary  3.1  finite  scenarios  of  length 

1  stalling  and  ending  in  such  initial  states  are  f-representative.  The  algorithm  places  initial  states 
from  class  2  components  in  the  result  rs-automaton  (state  set  S2,  lines  1 1  and  13).  It  should  be  noted 
that  each  initial  state  in  a  class  2  component  signifies  that  every  finite  scenario  starting  in  the  state 
fails  with  respect  P,  meaning  that  the  model  M  violates  P  in  its  initial  state.  Therefore,  in  practice 
class  2  components  are  unlikely  to  contain  initial  states. 

Since  Qj  is  true  in  all  of  the  states  in  class  2  components.  Corollary  3.1  guarantees  that  any  finite 
scenario  ending  in  a  class  2  component  is  f-representative.  Therefore,  it  is  sufficient  to  include  in  the 
result  rs-automaton  only  those  border  states  of  class  2  components  that  have  an  incoming  transition 
from  a  class  1  component  (state  set  S3,  lines  12  and  13).  Components  5  and  7  in  Figure  3.7(c)  have 
one  border  state  each.  Both  border  states  arc  included  in  the  result  automaton  in  Figure  3.7(d). 

Finally,  components  in  class  3  have  no  final  states,  so  they  do  not  participate  in  any  finite  sce¬ 
narios  accepted  by  automaton  I  and  can  therefore  be  safely  discarded. 

Three  points  about  the  three  component  classes  arc  important  in  the  proofs  of  correctness  that 
follow.  First,  to  justify  the  case  analysis  in  the  proofs  we  observe  that  the  classes  arc  mutually  exclu¬ 
sive;  an  SCC  can  belong  to  one  and  only  one  of  the  three  classes.  Second,  once  a  path  enters  a  class 

2  component,  it  can  never  re-enter  a  class  1  component.  Third,  once  a  path  enters  a  class  3  com¬ 
ponent,  it  can  never  re-enter  a  component  from  any  of  the  other  three  classes.  These  observations 
follow  trivially  from  the  definitions  of  the  component  classes. 

Lemmas  3.13,  3.14,  3.15,  and  3. 16  refer  to  inputs  and  outputs  of  the  algorithm  F-RS  Automat¬ 
on.  The  proofs  refer  to  the  three  component  classes  defined  above. 
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Figure  3.7:  F-Exhaustive  RS-Automaton  Algorithm  Example 
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Lemma  3.13  (f-rs-sound)  If  7  is  an  execution  in  the  the  output  rs-automaton  Mf  =  ( Mr ,  S,p,  0), 
then  7  is  failing  scenario  of  the  input  automaton  M  with  respect  to  P. 

Proof: 

Let  7  =  sot.Q  . . .  tn-\sn.  Since  7  is  accepted  by  the  output  rs-automaton  Mrp ,  the  state  sn  must 
be  final  in  Mp.  Therefore,  sn  is  also  final  in  I  (step  15).  So  7  is  accepted  by  /,  and  must  be  a  failing 
scenario  of  M  with  respect  to  P. 

□ 

Lemma  3.14  (f-rs-exhaustive)  If  a  finite  execution  a  of  the  input  automaton  M  =  (S,  r,  So,  S,  S,  D ) 
violates  the  input  correctness  property  P,  then  for  the  output  rs-automaton  Mp  =  (Mr,  Srp  0)  one 
of  the  following  holds: 

1.  a  is  an  r-scenario  in  Mp,  or 

2.  anfn-prefix  of  a  is  an  f- scenario  in  Mffor  some  value  ofn 

Proof: 

Let  a  =  Soto  •  •  •  tn~ \sn  be  a  finite  execution  of  the  input  model  such  that  a  violates  the  correct¬ 
ness  property  P.  Since  a  violates  the  correctness  property  P,  every  state  and  transition  in  a  will  be 
present  in  the  intersection  automaton  I  =  M  n  Np  ([19],  pp.  123-125). 

Consider  the  initial  and  the  terminal  states  of  a  with  respect  to  the  SCC  graph  of  intersection 
automaton  I.  The  following  case  analysis  covers  the  possible  locations  for  these  two  states. 

Case  1  Both  sq  and  sn  are  in  class  1  components.  In  this  case,  the  entire  execution  a  must 
go  through  class  1  components,  so  the  algorithm  includes  all  states  of  a  in  the  output  rs-automaton 
Mrp  (lines  10  and  13).  By  construction,  all  the  final  states  of  /  that  also  exist  in  Mp  are  final  in  Mp 
(line  15),  so  state  sn  is  final  in  Mp.  Therefore,  a  is  an  r-scenario  in  Mrp. 

Case  2  sq  is  in  a  class  2  component.  In  this  case  the  algorithm  adds  state  so  to  Mp  in  steps  1 1 
and  13.  Furthermore,  so  is  placed  in  the  set  S', 7  of  ('-scenario  termination  points  in  step  16.  Finally, 
all  states  belonging  to  class  2  components  are  final  in  I  and  therefore  final  in  Mp  (step  15).  So  so 
is  final  in  Mp.  We  conclude  that  the  trivial  execution  consisting  of  state  so,  which  is  the  fi-prefix 
of  a,  is  an  ('-scenario  in  Mp. 

Case  3  so  is  in  a  class  1  component  and  sn  is  in  a  class  2  component.  In  this  case  at  some 
point  a  must  transition  from  a  state  s /,  belonging  to  a  class  1  component  to  a  state  s/,.+  ]  belonging 
to  a  class  2  component.  All  the  states  prior  to  Sk  belong  to  class  1  components,  so  they  are  added 
to  the  output  rs-automaton  in  steps  10  and  13.  Steps  12  and  13  add  state  s^+i  to  the  automaton. 
In  addition,  step  16  adds  state  to  the  set  S,y.  Finally,  state  Sfc+i  is  final  in  /,  so  it  is  also 
final  in  Mp.  Therefore,  the  finite  execution  7  =  S(fo  . .  .tk-iSktkSk+i  is  an  ('-scenario  in  Mp. 
We  know  that  is  true  in  the  component  containing  state  Sfc+i,  so  by  Corollary  3.1  execution  7 
is  f-representative  in  M.  We  conclude  that  7  is  an  f/.+ 1  -prefix  of  a  and  an  ('-scenario  in  Mp,  as 
required. 

Case  4  The  remaining  cases  arc  impossible.  The  terminal  state  sn  of  a  is  final,  so  a  cannot  go 
through  any  states  in  a  class  3  component. 

□ 

Lemma  3.15  (f-rs-succinct  state)  A  state  s  of  the  input  automaton  M  =  (S,  r,  Sq,  S,  S.  D )  is  a 
state  in  the  output  rs-automaton  Mp  =  (Mr,  Srp  0)  if  and  only  if  there  is  a  finite  execution  a  of 
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the  input  automaton  M  =  (5,  r,  So,  S,  S ,  D)  that  violates  the  correctness  property  P  and  contains 
s. 


Proof: 

(=£>)  Let  s  he  a  state  in  the  output  rs-automaton  Mp  =  (Mr,  Srj,  0).  By  construction,  s  is  is 
a  member  of  a  reachable  SCC  from  which  it  is  possible  to  reach  another  SCC  that  contains  a  final 
state.  So  there  exists  a  path  a  in  Mrp  that  starts  in  an  initial  state,  goes  through  s,  and  ends  in  a 
final  state.  By  definition,  a  is  an  execution  of  Mrp.  By  lemma  3.13,  a  is  an  execution  of  the  input 
automaton  M  and  it  violates  correctness  property  P. 

(<=)  If  there  is  an  execution  in  Mp  that  contains  s,  then  trivially  s  is  in  Mp.  n 


Lemma  3.16  (f-rs-succinct  transition )  A  transition  t  =  (s  i,  S2)  G  r  of  the  input  automaton  M  = 
( S ,  r,  S0,  S,  S ,  D)  is  a  transition  in  the  output  rs-automaton  Mp  =  ( Mr ,  Srp  0)  if  and  only  if  there 
is  a  finite  execution  a  of  the  input  automaton  M  =  (S,  r,  So,  S,  S,  D)  that  violates  the  correctness 
property  P  and  contains  t. 

Proof: 

(=$■)  Let  t  be  a  transition  in  the  output  rs-automaton  Mfi  =  ( Mr ,  Srp  0).  The  proof  that  t  must 
lie  on  an  execution  of  Ms  that  is  also  an  execution  of  M  and  violates  P  is  analogous  to  the  proof  of 
Lemma  3.15,  (=^>). 

(<=)  If  there  is  an  execution  in  Ms  that  contains  t,  then  trivially  t  is  in  Ms.  □ 

Theorem  3.5  The  algorithm  in  Figure  3.6  generates  sound,  f-exhaustive,  and  succinct  rs-automata. 

Proof: 

Follows  directly  from  Lemmas  3.14,  3.13,  3.15,  and  3.16  and  Definitions  2.18,  2.20,  and  2.8. 

□ 


3.3.5  Computing  RS-Automata  for  Infi  nite  Scenarios 

Lemma  3. 12  and  Corollary  3. 1  have  counterparts  for  i-representative  scenarios. 

Lemma  3.17  Given  a  Biichi  automaton  M  and  a  correctness  property  P,  a  finite  execution  7  = 
so^o  •  •  •  tn-i-Sn  in  M  is  i-representative  with  respect  to  P  if  and  only  if  every  cycle  reachable  from 
sn  in  the  automaton  I  =  M  C I  ^P  contains  an  acceptance  state. 

Proof: 

(=£>)  Let  7  =  Soto  •  •  •  tn-iSn  be  an  i-representative  execution  in  M  with  respect  to  P.  Let 
6  =  Sq£q  . . .  t'n_  |  s[j  be  any  cycle  reachable  from  sn  in  I  via  the  path  6.  The  infinite  execution 
a  =  7 (3006 . . .  has  an  i-representative  prefix  7,  so  by  Definition  2.13  the  execution  a  is  failing 
with  respect  to  property  P.  Automaton  I  accepts  all  failing  executions  of  M,  so  it  must  accept  the 
execution  a.  a  must  visit  an  acceptance  state  infinitely  often,  so  the  cycle  6  contains  an  acceptance 
state. 

(<=)  Let  7  =  soto  •  •  •  tn-iSn  be  an  finite  execution  in  M  with  respect  to  P.  Assume  that  every 
cycle  reachable  from  7’s  final  state  sn  in  the  automaton  /  contains  an  acceptance  state.  To  prove 
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that  7  is  i-representative  with  respect  to  P,  we  have  to  show  that  any  infinite  scenario  of  M  with 
prefix  7  is  failing  with  respect  to  P.  Let  a  =  7/3  be  an  infinite  scenario  of  M.  f3  is  an  infinite  suffix 
of  a,  and  it  must  loop  around  at  least  one  cycle  infinitely  often.  By  assumption  that  cycle  contains 
an  acceptance  state  in  I,  so  a  is  accepted  by  I.  I  accepts  only  failing  scenarios  of  M  with  respect 
to  P,  therefore  a  fails  with  respect  to  P. 

□ 


Given  a  Biichi  automaton  M  and  a  correctness  property  P,  let  I  =  M  n  ~^P  and  let  SCCj  = 
(C.  Er.C{))  be  the  SCC  graph  of  automaton  I.  Define  the  following  predicate  on  the  components  of 
SCCr: 


Qi  =  X  c  :  C  .  (c  contains  an  acceptance  cycle) 

Then  we  have  the  following  Corollary  to  Lemma  3.17: 

Corollary  3.2  A  finite  execution  7  =  soto  •  •  •  tn~isn  In  M  is  f-representative  with  respect  to  P  if 
and  only  if  in  the  SCC  graph  SCCj,  Q V  is  true  for  the  component  containing  state  sn. 

The  algorithm  I-RSAutomaton  for  generating  i-exhaustive  rs-automata  is  shown  in  Figure  3.6. 
It  is  almost  identical  to  the  algorithm  for  generating  f-exhaustive  automata.  The  output  of  the 
algorithm  accepts  only  i-scenarios  and  infinite  r-scenarios. 

The  algorithm  begins  by  computing  the  intersection  automaton  I  =  M  n  -iP  and  its  strongly 
connected  component  graph  SCCj.  Lines  4  and  5  define  two  predicates  on  the  components  in 
SCCi,  Qs  and  Q%.  The  predicate  Qs  identifies  nontrivial  SCCs  with  at  least  one  acceptance  state. 
The  predicate  Qi  identifies  SCCs  where  every  cycle  has  an  acceptance  state.  The  predicate  Qi 
can  be  evaluated  on  a  component  c  algorithmically  as  follows:  remove  all  acceptance  states  from 
component  c  and  look  for  a  cycle  involving  the  remaining  states  (e.g.  using  depth  first  search). 

Lines  6  and  7  compute  two  sets  of  components  using  the  RSET  algorithms.  Set  Cs  consists 
of  components  where  QQs  holds,  and  set  C,  consists  of  components  where  if]  holds.  Next,  the 
algorithm  splits  the  SCCs  of  /  into  four  classes: 

1.  Components  where  Q ^  is  true  and  (f\  is  false  (set  Ci). 

2.  Components  where  both  QQs  and  Q-  arc  true  (set  C 2). 

3.  All  other  components. 

Components  in  class  1  consist  of  states  from  which  it  is  possible  to  reach  an  acceptance  cycle. 
Furthermore,  since  Cf]  is  false  for  class  1  components,  by  Corollary  3.2  i-representative  scenarios 
cannot  terminate  inside  class  1  components.  It  is  therefore  necessary  to  include  class  1  components 
in  the  result  rs-automaton  in  their  entirety  (state  set  Si,  lines  10  and  13). 

Components  in  class  2  can  contain  initial  states.  In  all  of  those  initial  states  Q'(  is  true.  By 
Corollary  3.2  finite  scenarios  of  length  1  starting  and  ending  in  such  initial  states  arc  i-representative. 
The  algorithm  places  initial  states  from  class  2  components  in  the  result  rs-automaton  (state  set  S 2, 
lines  11  and  13).  As  has  been  noted  in  Section  3.3.4,  in  practice  class  2  components  are  unlikely  to 
contain  initial  states. 
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Input: 

M  =  (S.  t.  So.  S.  S.  D)  -  the  model  (Biichi  automaton) 

P  -  an  LTL  correctness  property 

Output: 

Rs-automaton  Mfi  =  ( Mr ,  0,  Sn-) 

Algorithm:  I-RSAutomaton (M,P) 

-  Convert  —> p  to  a  Biichi  automaton  Np 

(1)  Np  4—  ConvertToBiichi(-iP); 

-  Construct  an  intersection  of  M  and  Np 

(2)  I  =  ( S\t\  S*S*,  Si  Sj,  Dl  <-  M  n  NP- 

-  Compute  the  SCC  graph  of  I 

(3)  SCCi  =  (C,Et,C0)  «-  Tar  janSCC(I); 

-  Define  a  predicate  on  C  identifying  SCCs  with  an  acceptance  cycle 

(4)  Qs  4—  A  c  :  C.(c  has  an  acceptance  cycle)-, 

-  Define  a  predicate  on  C  identifying  SCCs  where  every  cycle  has  an  acceptance  state 

(5)  Qi  <—  A  c  :  C.(V/3  G  c  |  f3  is  a  cycle  .  (3s  G  f3  .  s  G  51*)); 

-  Compute  the  set  of  components  where  is  true 

(6)  «-  RSET-EXI  STENTIAL(C,  Et,Cq,  Qs)l 

-  Compute  the  set  of  components  where  Cf[  is  true 

(7)  Ci  4-  rset-universal(C,  Et, C0,  Qi)-, 

-  Compute  the  set  of  components  where  Q3S  is  true  and  if]  is  false 

(8  )C1*-C8\Ci-, 

-  Compute  the  set  of  components  where  both  and  (f)  are  true 

(9)  C2  Cs  n  Ci  n  C0; 

-  Gather  all  the  states  in  C\ 

(10)  5!  -  UceCi  c; 

-  Isolate  the  initial  states  of  C, 2 

(11)  s2^-  si0  <1  (Ucec2  C); 

-  Isolate  the  border  states  of'C-> 

(12)  S3  <-  G  (UceC2  c)  |  3s'  G  Si  .  (s',  s)  G  r% 

-  The  rs-automaton  includes  all  states  in  C\,  initial  states  ofC2,  and  border  states  ofC-2 

(13)  Ss  ♦-  Si  u  s2  u  S3. 

-  Restrict  the  transition  relation  to  the  states  in  Ss 

(14)  Ts  <-  Tl  n  (Ss  X  Ss). 

(15)  Mr  4-  ( Ss,rs ,  s*  n  Ss,  si  n  ss,  s2  u  s3,  &). 

-  The  set  Sn-  of  i-scenario  end-points  consists  of  initial  states  in  C2  and  border  states  ofC2  and  C3 

(16)  Return  Mrp  =  (Mr,  0,  S2  U  S3). 


Figure  3.8:  Algorithm  for  Generating  I-Exhaustive  RS-Automata 
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Since  Q V  is  true  in  all  of  the  states  in  class  2  components.  Corollary  3.2  guarantees  that  any 
finite  scenario  ending  in  a  class  2  component  is  i-representative.  Therefore,  it  is  sufficient  to  include 
in  the  result  rs-automaton  only  those  border  states  of  class  2  components  that  have  an  incoming 
transition  from  a  class  1  component  (state  set  S3,  lines  12  and  13).  Finally,  components  in  class  3 
have  no  acceptance  states,  so  they  do  not  participate  in  any  infinite  scenarios  accepted  by  automaton 
I  and  can  therefore  be  safely  discarded. 

Lemmas  3.18,  3.19,  3.20,  and  3.21  and  Theorem  3.6  refer  to  inputs  and  outputs  of  the  algorithm 
I-RSAutomaton.  We  omit  the  proofs,  which  are  analogous  to  the  proofs  of  Lemmas  3.13,  3.14, 
3.15,  and  3.16  and  Theorem  3.5. 

Lemma  3.18  (i-rs-sound)  If  7  is  an  execution  in  the  the  output  rs-automaton  Mf  =  ( Mr ,  0,  Sn-), 
then  7  is  either  a  failing  scenario  of  the  input  automaton  M  with  respect  to  P  or  an  i-prefix  of  a 
failing  scenario  of  M. 

Lemma  3.19  (i-rs-exhaustive)  If  an  infinite  execution  a  of  the  input  automaton  M  =  (S,  r,  So,  S,  S,  D ) 
violates  the  input  correctness  property  P,  then  for  the  output  rs-automaton  Mf  =  ( Mr ,  0,  Sri )  one 
of  the  following  holds: 

1.  a  is  an  r-scenario  in  Mf,  or 

2.  an  in-prefix  of  a  is  an  i-scenario  in  Mfifor  some  value  ofn 

Lemma  3.20  (i-rs-succinct  state)  A  state  s  of  the  input  automaton  M  =  (S,  r,  So,  S,  S,  D)  is  a 
state  in  the  output  rs-automaton  Mf  =  ( Mr ,  0,  Sn)  if  and  only  if  there  is  an  infinite  execution  a  of 
the  input  automaton  M  =  (S,  r,  So,  S,  S,  D)  that  violates  the  correctness  property  P  and  contains 
s. 


Lemma  3.21  (i-rs-succinct  transition )  A  transition  t  =  (s\,  s  2)  £  r  of  the  input  automaton  M  = 
(S,  r,  So,  S,  S,  D)  is  a  transition  in  the  output  rs-automaton  Mrp  =  (Mr,0,Sn)  if  and  only  if 
there  is  an  infinite  execution  a  of  the  input  automaton  M  =  (S,t,  Sq,  S,  S,  D)  that  violates  the 
correctness  property  P  and  contains  t. 


Theorem  3.6  The  algorithm  in  Figure  3.8  generates  sound,  i-exhaustive,  and  succinct  rs-automata. 
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Chapter  4 


Performance 


In  this  chapter  we  evaluate  the  performance  of  the  scenario  graph  algorithms  and  consider  imple¬ 
mentation  variations  that  trade  off  running  time  against  state  space  coverage  and  memory  consump¬ 
tion.  The  traceback  mode  variation  described  in  Section  4.3.2  is  a  new  scientific  contribution  to 
explicit  state  model  checking. 


4.1  Symbolic  Algorithm  Performance 

Preliminary  tests  on  an  implementation  of  the  symbolic  scenario  graph  algorithm  indicated  that 
the  symbolic  approach  does  not  work  well  on  the  attack  graph  application  that  we  used  as  our 
primary  testing  environment.  For  example,  we  could  not  initially  process  a  small  model  with  six 
hundred  reachable  states.  The  binary  decision  diagrams  constructed  for  the  model  reached  unman¬ 
ageable  size  due  to  complex  interactions  between  state  variables.  The  model  went  through  the  model 
checker  after  we  found  a  suitable  variable  ordering;  however,  it  still  took  the  symbolic  algorithm 
over  two  hours  to  process  the  model.  Based  on  the  preliminary  results,  we  did  not  conduct  a  thor¬ 
ough  study  of  the  symbolic  algorithm's  performance,  choosing  instead  to  focus  on  the  explicit-state 
implementation. 


4.2  Explicit-State  Algorithm  Performance 

For  clarity  of  presentation,  the  explicit-state  algorithm  in  Figure  3.4  is  split  into  several  separate 
phases.  Step  2  computes  an  intersection  of  two  automata,  step  3  computes  the  SCC  graph  of  the 
result,  step  5  traverses  the  SCC  graph  and  marks  components  that  will  be  included  in  the  output 
scenario  automaton,  and  finally  steps  6  and  7  bring  together  the  pieces  of  the  output  automaton. 
In  practice,  the  tasks  in  steps  2,  5,  6,  and  7  can  be  performed  in  parallel  with  step  3,  i.e.  during 
the  search  for  strongly  connected  components.  All  of  the  steps  can  be  accomplished  using  a  single 
depth-first  traversal  of  the  reachable  state  space.  Thus,  the  explicit-state  algorithm  has  the  same 
asymptotic  complexity  as  a  depth-first  traversal  of  the  state  space  graph.  Asymptotic  running  time 
is  T(E )  =  S(E)0(E )  [90],  where  E  is  the  number  of  edges  in  the  graph  and  S(E)  is  the  running 
time  of  the  search  algorithm  used  to  detect  when  the  current  state  in  the  search  had  been  found 
previously.  If  a  hash  table  is  used  to  store  states,  searching  for  previously  explored  nodes  can  be 
done  in  amortized  constant  time,  so  the  overall  model  checking  algorithm  runs  in  amortized  O(E) 
time. 
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Figure  4.1:  Explicit  State  Model  Checker  Performance 
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We  tested  the  performance  of  the  explicit-state  model  checking  algorithm  implementation  on 
108  test  cases.  The  largest  test  case  produced  a  scenario  graph  with  46  thousand  states  and  838 
thousand  edges  and  took  38  seconds  to  generate. 

Our  empirical  tests  bear  out  the  asymptotic  bound.  The  test  machine  was  a  lGhz  Pentium  III 
with  1GB  of  RAM,  running  Red  Hat  Linux  7.3.  Figure  4.1(a)  plots  running  time  of  the  implemen¬ 
tation  against  the  number  of  edges  in  the  reachable  state  graph.  The  roughly  linear  relationship 
between  the  number  of  edges  and  running  time  is  evident  in  the  plot,  and  confirmed  by  statistical 
analysis.  Least-squares  linear  regression  estimates  the  linear  coefficient  at  1.12  x  10”4,  or  0.112 
seconds  of  running  time  per  one  thousand  explored  edges  of  the  reachable  state  graph.  The  coef¬ 
ficient  of  determination  ( R 2)  is  0.9967,  which  means  lineal-  regression  explains  almost  all  of  the 
variability  in  the  data. 


4.3  Explicit-State  Algorithm  Memory  Usage 

LTL  model  checking  is  PSPACE-complete  [19],  and  memory  is  often  the  factor  limiting  the  size  of 
the  model  that  we  are  able  to  verify.  To  ensure  completeness  of  the  search,  all  previously-explored 
states  are  kept  until  the  end,  so  that  we  can  detect  when  the  search  reaches  a  state  that  had  been 
explored  previously.  Figure  4.1(b)  plots  memory  consumption  against  the  number  of  edges  in  the 
reachable  state  graph. 

All  of  the  work  in  the  explicit-state  algorithm  for  scenario  graph  generation  (Section  3.3,  Fig¬ 
ure  3.4)  is  done  in  a  single  depth-first  search  pass  through  the  state  space  of  the  model.  The  basic 
DFS  procedure  is  shown  in  Figure  4.2.  To  traverse  the  state  space  starting  in  the  initial  state,  we 
invoke  the  DFS-Visit  procedure  with  the  initial  state  as  an  argument.  As  the  procedure  discovers 
new  nodes  in  the  state  space  graph,  it  stores  them  in  a  table  T.  Before  further  exploring  a  new  node, 
the  algorithm  consults  the  table  to  verify  that  the  node  has  not  been  encountered  previously. 

Both  performance  and  memory  requirements  of  the  algorithm  depend  critically  on  the  imple¬ 
mentation  of  table  insertion  (procedure  Insert)  and  lookup  (procedure  Explored).  If  saving 
memory  is  critical,  the  table  implementation  can  keep  partial  information  about  each  state,  achiev¬ 
ing  significant  savings. 

The  simplest  implementation  choice  is  a  classic  hash  table.  The  table  stores  the  entire  specifi¬ 
cation  for  each  state.  With  a  suitable  hash  function,  both  insertion  and  lookup  operations  require 
amortized  0(1)  time.  The  DFS  algorithm  therefore  takes  amortized  O(E)  time,  where  E  is  the 
number  of  edges  in  the  state  graph.  This  implementation  guarantees  full  coverage  of  the  state  space, 
at  the  cost  of  high  memory  requirements.  Each  state  kept  in  the  table  consumes  the  full  number  of 
bits  required  to  represent  it.  The  results  in  Figure  3.4  use  a  hash  table. 


DFS-V±s±t(i;) 

1  for  each  child  iiofr 

2  if  not  Explored(T,  u ) 

3  Insert  (T,v,u) 

4  DFS-Visitfu) 


Figure  4.2:  Basic  Depth-First  Search 
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4.3.1  Hashcompact  Search  Mode 

The  hashcompact  mode ,  first  proposed  by  Wolper  and  Leroy  [98]  and  subsequently  implemented  in 
the  SPIN  model  checker,  is  an  alternative  way  of  keeping  state  information.  Instead  of  representing 
the  entire  state,  only  a  hash  value  for  each  visited  state  is  kept  in  the  table.  The  Explored  proce¬ 
dure  looks  for  the  hash  of  the  current  state  in  the  table,  returning  true  whenever  the  hash  is  found. 
This  method  retains  the  O(E)  asymptotic  performance  bound,  but  does  not  offer  guarantees  of  full 
state  coverage,  as  hash  function  collisions  may  cause  the  algorithm  to  skip  unexplored  states  during 
the  search.  The  advantage  is  that  it  is  only  necessary  to  keep  a  small,  fixed  number  of  bits  per  state 
in  the  table,  drastically  reducing  memory  requirements.  There  is  some  wiggle  room:  increasing  the 
number  of  bits  produced  by  the  hash  function  produces  a  corresponding  reduction  in  the  probability 
of  hash  collisions. 

4.3.2  Traceback  Search  Mode 

The  traceback  mode  is  a  new  technique  for  explicit  state  space  search.  The  mode  retains  collision 
detection  functionality  in  the  table  implementation,  reducing  memory  requirements  at  the  cost  of 
performance.  In  this  mode  the  table  keeps  the  minimum  amount  of  information  necessary  to  recon¬ 
struct  a  node  s  whenever  a  potential  collision  must  be  resolved.  Specifically,  it  is  sufficient  to  retain 
a  pointer  to  the  parent  of  s  and  the  identity  of  the  state  machine  transition  that  generated  s.  The 
parent  pointer  identities  the  node  from  which  s  was  first  found  during  depth-first  search. 

When  the  algorithm  detects  a  potential  collision  at  node  s,  it  reconstructs  the  DFS  stack  as  it 
was  when  s  was  first  found.  The  reconstruction  procedure  first  pushes  node  s  on  the  stack,  then 
the  parent  of  s,  then  the  parent  of  the  parent  of  s,  etc.,  until  an  initial  state  is  found.  The  procedure 
then  traverses  the  stack,  using  the  stored  identity  of  the  state  machine  transition  at  each  node  to 
reconstruct  the  full  state  at  the  node. 

Assuming  constant-size  pointers,  this  method  requires  a  small,  fixed  amount  of  memory  per 
node  and  guarantees  complete  state  space  coverage.  In  the  traceback  mode  collision  detection  and 
resolution  (and  by  extension,  the  Explored  operation)  reconstructs  the  entire  DFS  stack  each  time 
it  is  invoked.  Let  the  diameter  of  the  state  space  graph  be  the  longest  possible  cycle-free  path  that 
stalls  in  an  initial  state.  Then,  the  Explored  operation  is  linear  in  the  diameter  d  of  the  graph  in 
the  worst  case,  resulting  in  an  0(E)0(d)  asymptotic  performance  bound  for  the  main  algorithm. 

In  summary,  hashcompact  and  traceback  modes  offer  trade-offs  between  memory  consumption, 
performance,  and  state  space  coverage.  The  trade-offs  arc  shown  in  the  table  below.  S  represents 
the  number  of  bits  necessary  to  store  a  single  state  of  the  model  without  loss  of  information. 


Mode 

Full  state 

Hashcompact 

Traceback 

Amortized  Running  Time 

0(E) 

0(E) 

0(E)0(d) 

Memory  per  State 

S 

8  bytes 

14  bytes 

Coverage 

Full 

Partial 

Full 

4.4  Hashcompact  Mode  Trade-offs 

We  implemented  both  hashcompact  and  traceback  modes  in  our  scenario  graph  generator.  In  terms 
of  running  time  and  memory  requirements,  the  hashcompact  mode’s  performance  is  identical  to  the 
the  regular,  full  state  mode,  as  shown  in  Figure  4.3(a). 
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Figure  4.3:  Hashcompact  Mode  Trade-offs 
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Hashcompact’s  memory  footprint  is  significantly  smaller.  Figure  4.3(b)  shows  the  memory 
savings  of  the  hashcompact  mode  vs.  regular  search. 

In  exchange  for  memory  efficiency,  the  hashcompact  mode  sacrifices  correctness  guarantees 
offered  by  the  other  modes.  In  general,  the  hashcompact  search  algorithm  is  neither  sound  nor 
exhaustive.  We  demonstrate  this  claim  with  a  concrete  example. 

Let  si,  s 2,  and  S3  be  distinct  states  of  the  automaton  I  =  M  D  Np  and  h  the  hash  function  used 
by  the  hash  table.  Assume  further  that  h(s  1)  =  //(s'9),  there  is  a  transition  t  =  (s 3,  s 2)  in  the  model, 
and  the  search  discovers  state  si  prior  to  arriving  at  S3.  Upon  discovering  transition  t,  the  search 
procedure  checks  to  see  if  state  S2  had  been  visited  previously.  Due  to  the  hash  collision  between  si 
and  S2,  the  algorithm  erroneously  assumes  that  s 2  had  already  been  discovered,  and  adds  an  edge 
from  S3  to  si  to  the  state  graph. 

The  effects  of  this  step  arc  twofold.  The  search  fails  to  consider  state  S2  (and,  potentially,  the 
entire  subgraph  rooted  at  S2),  resulting  in  an  incomplete  search.  Further,  the  added  edge  from  S3  to 
si  may  not  be  a  legitimate  transition  of  I.  In  the  resulting  scenario  graph,  all  executions  containing 
this  transition  will  represent  non-existent  failure  scenarios.  The  hashcompact  method  can  thus  be 
seen  as  a  risky  heuristic  search  that  makes  no  correctness  claims  but  enables  efficient  exploration  of 
larger  state  spaces  than  would  be  possible  with  full  state  space  search. 


4.5  Traceback  Mode  Trade-offs 

Figures  4.4(a,b)  plot  performance  and  memory  requirements  of  the  regular  (full  state)  and  traceback 
modes  on  the  same  set  of  models.  The  figures  demonstrate  the  traceback  trade-off  between  perfor¬ 
mance  and  memory  consumption.  The  mode  requires  more  time  than  regular  search  due  to  stack 
reconstruction  overhead,  but  has  a  smaller  memory  footprint  compared  with  the  full  state  mode. 

We  have  noted  that  the  asymptotic  performance  bound  for  the  traceback  algorithm  is  0(E)0(d), 
so  the  linear  trend  apparent  in  Figure  4.4(a)  requires  an  explanation.  The  O(d)  factor  comes  from 
the  state  reconstruction  step  inside  the  Explored  operation.  The  average  per-edge  overhead  of 
this  step  depends  on  two  factors: 

1 .  the  average  depth  of  the  reconstructed  node,  and 

2.  the  percentage  of  edges  that  require  a  reconstruction. 

Figure  4.5(a)  plots  the  total  graph  diameter  and  the  average  depth  of  a  node  that  requires  reconstruc¬ 
tion  against  the  number  of  reachable  transitions.  After  some  initial  growth,  both  the  graph  diameter 
and  the  average  depth  of  a  reconstructed  node  stay  approximately  constant  over  the  majority  of  the 
sample  models,  so  the  size  of  each  reconstructed  DFS  stack  does  not  grow  appreciably  as  the  size 
of  the  state  space  increases. 

Figure  4.5(b)  plots  the  proportion  of  edges  that  require  a  reconstruction,  which  is  the  second 
factor  affecting  the  asymptotic  bound.  The  proportion  rapidly  reaches  about  95%  and  stays  there 
over  the  majority  of  the  sample  models.  This  means  that  the  second  factor  affecting  the  asymptotic 
bound  also  does  not  grow  with  the  size  of  the  model.  Together,  for  our  data  set  these  two  factors 
amount  to  roughly  constant  per-edge  overhead  for  node  reconstruction,  which  accounts  for  the  linear 
trend  in  Figure  4.4(a). 
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Chapter  5 


Scenario  Graphs:  Related  Work 


For  over  twenty  years,  model  checking  techniques  have  been  studied  extensively  by  many  re¬ 
searchers.  Because  model-checking  can  be  performed  automatically,  it  is  preferable  to  deductive 
verification,  whenever  it  can  be  applied.  One  of  the  most  attractive  features  of  model  checking  as 
a  computer  aided  verification  technique  is  the  ability  to  generate  counterexamples  that  highlight 
for  the  user  the  often  subtle  errors  in  concurrent  systems.  To  our  knowledge,  we  arc  the  first  to 
study  algorithms  that  aim  specifically  to  generate  the  full  set  of  counterexamples.  Nevertheless,  in 
this  chapter  we  mention  some  recent  related  work.  For  completeness,  we  include  a  brief  survey  of 
traditional  model  checking  techniques;  material  for  the  survey  is  drawn,  with  revisions,  from  [19]. 


5.1  Model  Checking 

The  principal  validation  methods  for  complex  systems  are  simulation,  testing,  deductive  verification, 
and  model  checking.  Simulation  and  testing  both  involve  making  experiments  before  deploying  the 
systems  in  the  field.  While  simulation  is  performed  on  an  abstraction  or  a  model  of  the  system, 
testing  is  performed  on  the  actual  product.  These  methods  can  be  a  cost-efficient  way  to  find  many 
errors.  However,  checking  of  all  possible  interactions  and  potential  pitfalls  using  simulation  and 
testing  techniques  is  rarely  possible. 

Model  checking  is  an  automatic  technique  for  verifying  finite-state  reactive  systems,  such  as 
sequential  circuit  designs  and  communication  protocols.  Specifications  arc  expressed  in  a  proposi¬ 
tional  temporal  logic,  and  the  reactive  system  is  modeled  as  a  state-transition  graph.  An  efficient 
search  procedure  is  used  to  determine  if  the  specifications  arc  satisfied  by  the  state-transition  graph. 
The  technique  was  originally  developed  in  1981  by  Clarke  and  Emerson  [32,  18].  Quielle  and 
Sifakis  [76]  independently  discovered  a  similar  verification  technique  shortly  thereafter. 

The  model  checking  technique  has  several  important  advantages  over  mechanical  theorem  provers 
or  proof  checkers  for  verification  of  circuits  and  protocols.  The  benefit  of  restricting  the  problem 
domain  to  finite  state  models  is  that  verification  can  be  performed  automatically.  Typically,  the  user 
provides  a  high  level  representation  of  the  model  and  the  correctness  specification  to  be  checked. 
The  procedure  uses  an  exhaustive  search  of  the  state  space  of  the  system  to  determine  if  the  specifi¬ 
cation  is  true  or  not.  Given  sufficient  resources,  the  procedure  will  either  terminate  with  the  answer 
true,  indicating  that  the  model  satisfies  the  specification,  or  give  a  counterexample  execution  that 
shows  why  the  correctness  formula  is  not  satisfied.  Moreover,  it  can  be  implemented  with  reason¬ 
able  efficiency  and  run  on  typical  desktops. 
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Temporal  logics  have  proved  to  be  useful  for  specifying  concurrent  systems,  because  they  can 
describe  the  ordering  of  events  in  time  without  introducing  time  explicitly.  Pnueli  [74]  was  the  first 
to  use  temporal  logic  for  reasoning  about  concurrency.  His  approach  involved  proving  properties  of 
the  program  under  consideration  from  a  set  of  axioms  that  described  the  behavior  of  the  individual 
statements  of  the  program.  The  method  was  extended  to  sequential  circuits  by  Bochmann  [95]  and 
Malachi  and  Owicki  [64] .  Since  proofs  were  constructed  by  hand,  the  technique  was  often  diffi¬ 
cult  to  use  in  practice.  The  introduction  of  temporal-logic  model  checking  algorithms  by  Clarke 
and  Emerson  [32,  33]  in  the  early  1980s  allowed  this  type  of  reasoning  to  be  automated.  Because 
checking  that  a  single  model  satisfies  a  formula  is  much  easier  than  proving  the  validity  of  a  for¬ 
mula  for  all  models,  it  was  possible  to  implement  this  technique  very  efficiently.  The  algorithm 
developed  by  Clarke  and  Emerson  for  the  branching-time  logic  CTL  was  polynomial  in  both  the 
size  of  the  model  determined  by  the  program  under  consideration  and  in  the  length  of  its  specifica¬ 
tion  in  temporal  logic.  They  also  showed  how  fairness  [38]  could  be  handled  without  changing  the 
complexity  of  the  algorithm.  This  was  an  important  step  in  that  the  correctness  of  many  concurrent 
programs  depends  on  some  type  of  fairness  assumption;  for  example,  absence  of  starvation  in  a  mu¬ 
tual  exclusion  algorithm  may  depend  on  the  assumption  that  each  process  makes  progress  infinitely 
often. 

The  first  step  in  program  verification  is  to  come  up  with  a  formal  specification  of  the  program. 
In  a  finite  state  system,  each  state  is  characterized  by  a  finite  amount  of  information,  and  this  in¬ 
formation  can  be  described  by  certain  atomic  proposition.  This  means  that  a  finite-state  program 
can  be  viewed  as  a  finite  propositional  Kripke  structure  and  that  it  can  be  specified  using  proposi¬ 
tional  temporal  logic.  Thus,  to  verify  the  correctness  of  the  program,  one  has  only  to  check  that 
the  program,  viewed  as  a  finite  Kripke  structure,  satisfies  (is  a  model  of)  the  propositional  temporal 
logic  specification.  This  was  the  approach  of  the  early  model  checking  algorithms  [32,  33],  further 
studied  in  [60,  34,  35]. 

At  roughly  the  same  time  Quielle  and  Sifakis  [76]  gave  a  model  checking  algorithm  for  a  subset 
of  the  branching-time  temporal  logic  CTL,  but  they  did  not  analyze  its  complexity.  Later  Clarke, 
Emerson,  and  Sistla  [18]  devised  an  improved  algorithm  that  was  linear  in  the  product  of  the  length 
of  the  formula  and  the  size  of  the  state  transition  graph.  Early  model  checking  systems  were  able  to 
check  state  transition  graphs  with  between  104  and  105  states.  In  spite  of  these  limitations,  model 
checking  systems  were  used  successfully  to  find  previously  unknown  errors  in  several  published 
circuit  designs. 

Sistla  and  Clarke  [82,  83]  analyzed  the  model  checking  problem  for  a  variety  of  temporal  logics 
and  showed,  in  particular,  that  for  linear  temporal  logic  (LTL)  the  problem  was  PSPACE-complete. 
Pnueli  and  Lichtenstein  [60]  reanalyzed  the  complexity  of  checking  linear-time  formulas  and  dis¬ 
covered  that  although  the  complexity  appeal's  exponential  in  the  length  of  the  formula,  it  is  linear  in 
the  size  of  the  global  state  graph.  Based  on  this  observation,  they  argued  that  the  high  complexity  of 
linear-time  model  checking  might  still  be  acceptable  for  short  formulas.  The  same  year,  Fujita  [36] 
implemented  a  tableau  based  verification  system  for  LTL  formulas  and  showed  how  it  could  be  used 
for  hardware  verification. 

CTL*  is  a  very  expressive  logic  that  combines  both  branching-time  and  linear-time  operators. 
The  model  checking  problem  for  this  logic  was  first  considered  in  a  paper  by  Clarke,  Emerson,  and 
Sistla  [17],  where  it  was  shown  to  be  PSPACE-complete,  establishing  that  it  is  in  the  same  general 
complexity  class  as  the  model  checking  problem  for  LTL.  This  result  can  be  sharpened  to  show  that 
CTL*  and  LTL  model  checking  are  of  the  same  algorithmic  complexity  (up  to  a  constant  factor)  in 
both  the  size  of  the  state  graph  and  the  size  of  the  formula.  Thus,  for  purposes  of  model  checking, 
there  is  no  practical  complexity  advantage  to  restricting  oneself  to  a  linear  temporal  logic  [35]. 
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In  the  original  implementation  of  the  model  checking  algorithm,  transition  relations  were  rep¬ 
resented  explicitly  by  adjacency  lists.  For  concurrent  systems  with  small  numbers  of  processes,  the 
number  of  states  was  usually  fairly  small,  and  the  approach  was  often  quite  practical.  In  systems 
with  many  concurrent  parts,  however,  the  number  of  states  in  the  global  state  transition  graph  was 
too  large  to  handle.  The  possibility  of  verifying  systems  with  realistic  complexity  changed  dramat¬ 
ically  in  the  late  1980s  with  the  discovery  of  how  to  represent  transition  relations  using  ordered 
binary  decision  diagrams  (OBDDs)  [8].  This  discovery  was  made  independently  by  three  research 
teams  [11,  24,  73].  The  discovery  made  it  possible  to  avoid  explicitly  constructing  the  state  graph 
of  the  concurrent  system.  OBDDs  provide  a  canonical  form  for  boolean  formulas  that  is  often  sub¬ 
stantially  more  compact  than  conjunctive  or  disjunctive  normal  form,  and  very  efficient  algorithms 
have  been  developed  for  manipulating  them.  A  symbolic  model  checker  extracts  a  transition  system 
represented  as  an  OBDD  from  a  program  and  uses  an  OBDD-based  search  algorithm  to  determine 
whether  the  system  satisfies  its  specification.  Because  the  symbolic  representation  captures  some 
of  the  regularity  in  the  state  space  determined  by  circuits  and  protocols,  for  those  applications  it 
is  possible  to  verify  systems  with  an  extremely  large  number  of  states  many  orders  of  magnitude 
larger  than  could  be  handled  by  explicit- state  algorithms.  Various  refinements  of  the  OBDD-based 
techniques  have  pushed  the  state  count  up  to  more  than  10120  [13,  12]. 

Alternative  techniques  for  verifying  concurrent  systems  have  been  proposed  by  a  number  of 
other  researchers.  Many  of  these  approaches,  including  our  treatment  of  explicit-state  algorithms  in 
Chapter  3,  use  automata  for  specifications  as  well  as  for  implementations.  Vardi  and  Wolper  [93] 
first  proposed  the  use  of  cc-automata  (automata  over  infinite  words)  for  automated  verification.  They 
showed  how  the  linear  temporal  logic  model  checking  problem  could  be  formulated  in  terms  of 
language  containment  between  w-automata.  The  basic  idea  underlying  this  approach  is  that  for  any 
temporal  formula  we  can  construct  an  automaton  that  accepts  precisely  the  computations  that  satisfy 
the  formula.  The  connection  between  propositional  temporal  logic  and  formal  language  theory  has 
been  quite  extensively  studied  [38,  70,  82,  81,  97],  This  connection  is  based  on  the  fact  that  a 
computation  is  essentially  an  infinite  sequence  of  states.  Since  every  state  is  completely  described 
by  a  finite  set  of  atomic  propositions,  a  computation  can  be  viewed  as  an  infinite  word  over  the 
alphabet  of  truth  assignments  to  the  atomic  propositions.  Thus,  temporal  logic  formulas  can  be 
viewed  as  finite-state  acceptors.  More  precisely,  given  any  propositional  temporal  formula,  one  can 
construct  a  finite  automaton  on  infinite  words  [10,  66]  that  accepts  precisely  the  sequences  satisfied 
by  the  formula  [97], 

To  use  the  above  connection,  we  view  a  finite-state  program  as  a  finite-state  generator  of  infinite 
words.  Thus,  if  M  is  the  program  and  P  is  the  specification,  then  M  meets  P  if  every  infinite 
word  generated  by  M,  viewed  as  a  finite-state  generator,  is  accepted  by  P  viewed  as  a  finite-state 
acceptor.  This  reduces  the  model-checking  problem  to  a  purely  automata-theoretic  problem:  the 
problem  of  determining  if  the  automaton  M  n  ->P  is  empty,  i.e.,  if  it  accepts  no  word. 

Because  the  same  type  of  model  is  used  for  both  implementation  and  specification,  an  imple¬ 
mentation  at  one  level  can  also  be  used  as  a  specification  for  the  next  level  of  refinement.  The  use 
of  language  containment  is  implicit  in  the  work  of  Kurshan  [2],  which  ultimately  resulted  in  the 
development  of  the  verifier  COSPAN  [41,  57,  42,  58].  Language  containment  is  also  the  founda¬ 
tion  for  the  popular  SPIN  model  checker  [50,  44,  49].  Other  notions  of  conformance  between  the 
automata  have  also  been  considered,  including  observational  equivalence  and  various  refinement 
relations  [21], 
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5.2  Improvements  to  Explicit-State  Search 

Model  checkers  based  on  language  containment  typically  construct  an  explicit  representation  for 
each  state  as  they  search  the  state  space  of  the  model.  The  bitstate  hashing  (or  supertrace)  tech¬ 
nique  was  introduced  in  1987  [5 1]  and  elaborated  in  [50,  44]  as  a  method  to  increase  the  quality  of 
verification  by  explicit  state  search  for  large  models  that  exhaust  machine  memory.  The  technique 
performs  relatively  high-coverage  verifications  within  a  memory  area  that  may  be  orders  of  magni¬ 
tude  smaller  than  required  for  exhaustive  verifications.  The  method  has  made  it  possible  to  apply 
formal  verification  techniques  to  problems  that  would  normally  have  remained  beyond  the  scope  of 
explicit-state  tools,  e.g.  [16,  15,  61,  47,  46]. 

The  original  bitstate  hashing  technique  works  by  minimizing  the  amount  of  memory  necessary 
for  keeping  the  table  of  previously-visited  states.  The  table  is  a  table  of  bits  all  initially  set  to  0.  To 
add  a  state  to  the  table,  one  hashes  the  state  description  into  an  address  in  the  table  and  sets  the  bit 
at  this  address  to  1 .  To  determine  if  a  state  is  in  the  table,  one  applies  the  hash  function  to  the  state 
and  checks  whether  the  bit  appealing  at  the  computed  address  is  1. 

The  drawback  of  the  method  is  that  the  hash  function  can  compute  the  same  address  for  distinct 
states  and  hence  one  can  wrongly  conclude  that  a  state  has  been  visited,  whereas  one  has  actually 
encountered  a  hash  collision.  Wolper  and  Leroy  determined  that  the  reduction  in  memory  use 
comes  at  the  price  of  a  high  probability  of  ignoring  part  of  the  state  space  and  hence  of  missing 
existing  errors  [98].  They  also  proposed  an  improvement  to  the  basic  scheme,  which  they  called 
the  hashcompact  method.  The  idea  is  to  increase  the  number  of  bits  stored  in  the  table  and  to  use  a 
collision  resolution  scheme  for  the  values  stored.  Wolper  and  Leroy  claimed  collision  probabilities 
of  10”3  to  10  3  with  memory  use  on  the  order  of  40-100  bits  per  stored  state. 

Holzmann  [48]  provided  an  analysis  of  the  different  valiants  of  bitstate  hashing.  He  compared 
the  original  technique  with  two  alternative  valiants,  Wolper  and  Leroy’s  hashcompact  and  the  mul¬ 
tihash  technique  introduced  by  Stern  and  Dill  [87,  88].  According  to  his  results,  the  alternatives 
improve  over  the  original  bitstate  hashing  method  in  a  limited  domain  of  applications.  The  original 
bitstate  hashing  scheme  tends  to  make  fewer  assumptions  about  its  domain  of  application,  and  can 
outperform  its  competitors  for  large  problem  sizes  and/or  small  memory  bounds.  Holzmann  further 
showed  that  a  variation  of  the  multihash  technique,  sequential  bitstate  hashing,  that  can  outperform 
the  other  algorithms  in  terms  of  state  space  coverage  when  applied  to  very  large  problem  sizes,  at 
the  cost  of  increased  running  time.  At  the  limit,  sequential  bitstate  hashing  can  incur  exponentially 
increasing  runtime  cost  to  approximate  the  results  of  an  exhaustive  search  in  shrinking  memory. 

We  have  implemented  the  hashcompact  technique  for  reference  and  performance  comparisons 
only,  and  do  not  claim  to  improve  upon  the  existing  bitstate  hashing  work.  Our  contribution,  the 
traceback  method,  is  an  alternative  means  of  trading  performance  for  memory  savings.  The  method 
guarantees  full  coverage  of  the  state  space,  uses  a  small,  fixed  number  of  bits  per  stored  state,  and 
incurs  only  a  polynomial  runtime  cost  (the  runtime  is  quadratic  in  the  worst  case). 


5.3  Counterexamples 

Modern  model  checkers  produce  counterexamples  of  a  simple  non-branching  structure,  i.e.,  finite 
or  infinite  paths  (or  linear  counterexamples  [20]).  Despite  the  importance  of  counterexamples  in 
the  popularity  of  model  checking,  they  have  not  been  studied  intensively.  Algorithms  for  generating 
lineal-  counterexamples  and  their  complexity  were  first  studied  by  Clarke  et.  al.  [31]  and  Hojati  et. 
al.  [43].  Unfortunately,  only  properties  in  the  weak  temporal  logic  fragment  ACTL  n  LTL  have 
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lineal- counterexamples  [63].  Results  by  Buccafurri  et.  al.  [9]  show  that  recognizing  ACTL  formulas 
with  linear  counterexamples  is  PSPACE-hard.  Thus,  linear  counterexamples  are  not  complete  for 
ACTL  (i.e.,  not  every  violation  of  an  ACTL  specification  is  witnessed  by  a  linear  counterexample), 
and  recognizing  the  cases  where  linear  counterexamples  are  applicable  at  all  is  hard. 

In  the  most  comprehensive  effort  to  classify  non-linear  counterexamples,  Clarke  et.  al.  intro¬ 
duced  a  general  framework  for  counterexamples  [20].  They  determined  that  the  general  form  of 
counterexamples  for  ACTL  has  a  tree-like  structure,  and  showed  that  a  large  class  of  temporal  log¬ 
ics  admits  counterexamples  with  a  tree-like  transition  relation.  This  class  includes  the  Buy  One 
Get  One  Free  Logic  by  Grumberg  and  Kupferman  [56]  and  other  temporal  logics  with  cu-regular 
operators.  Our  scenario  graphs  consist  of  linear  counterexamples  only;  incorporating  non-linear 
counterexamples  remains  as  future  work. 


Part  II 

Attack  Graphs 
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Introduction:  Attack  Graphs 


As  networks  of  hosts  continue  to  grow,  it  becomes  increasingly  more  important  to  automate  the 
process  of  evaluating  their  vulnerability  to  attack.  When  evaluating  the  security  of  a  network,  it  is 
rarely  enough  to  consider  the  presence  or  absence  of  isolated  vulnerabilities.  Large  networks  typi¬ 
cally  contain  multiple  platforms  and  software  packages  and  employ  several  modes  of  connectivity. 
Inevitably,  such  networks  have  security  holes  that  escape  notice  of  even  the  most  diligent  system 
administrator. 


Attack  Graph 


Figure  6. 1 :  Vulnerability  Analysis  of  a  Network 

To  evaluate  the  security  of  a  network  of  hosts,  a  security  analyst  must  take  into  account  the 
effects  of  interactions  of  local  vulnerabilities  and  find  global  security  holes  introduced  by  intercon¬ 
nection.  A  typical  process  for  vulnerability  analysis  of  a  network  is  shown  in  Figure  6.1.  First, 
scanning  tools  determine  vulnerabilities  of  individual  hosts.  Using  this  local  vulnerability  informa¬ 
tion  along  with  other  information  about  the  network,  such  as  connectivity  between  hosts,  the  analyst 
produces  an  attack  graph.  Each  path  in  an  attack  graph  is  a  series  of  exploits,  which  we  call  actions, 
that  leads  to  an  undesirable  state.  An  example  of  an  undesirable  state  is  a  state  where  the  intruder 
has  obtained  administrative  access  to  a  critical  host. 

A  typical  result  of  such  efforts  is  a  hand-drawn  “white  hoard”  attack  graph,  such  as  the  one 
produced  by  a  Red  Team  at  Sandia  National  Labs  for  DARPA’s  CC20008  Information  battle  space 
preparation  experiment  and  shown  in  Figure  6.2.  Each  box  in  the  graph  designates  a  single  intruder 
action.  A  path  from  one  of  the  boxes  at  the  top  of  the  graph  to  one  of  the  boxes  at  the  bottom  is 
a  sequence  of  actions  corresponding  to  an  attack  scenario.  At  the  end  of  any  such  scenario,  the 
intruder  has  broken  the  network  security  in  some  way.  The  graph  is  included  here  for  illustrative 
purposes  only,  so  we  omit  the  description  of  specific  details. 
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Figure  6.2:  Sandia  Red  Team  Attack  Graph 
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Attack  graphs  can  serve  as  a  useful  tool  in  several  areas  of  network  security,  including  intrusion 
detection,  defense,  and  forensic  analysis.  To  motivate  our  study  of  attack  graphs  as  an  application 
of  scenario  graph  generation  algorithms,  we  briefly  review  the  potential  applications  to  these  areas 
of  security. 

Alert  Correlation 

System  administrators  arc  increasingly  deploying  intrusion  detections  systems  (IDSs)  to  detect 
and  combat  attacks  on  their  network.  Such  systems  depend  on  software  sensor  modules  that  detect 
suspicious  events  and  issue  alerts.  Configuring  the  sensors  involves  a  trade-off  between  sensitivity 
to  intrusions  and  the  rate  of  false  alarms  in  the  alert  stream.  When  the  sensors  arc  set  to  report  all 
suspicious  events,  they  frequently  issue  alerts  for  benign  background  events.  On  the  other  hand, 
decreasing  sensor  sensitivity  reduces  their  ability  to  detect  real  attacks. 

To  deal  with  this  problem,  intrusion  detection  systems  employ  heuristic  algorithms  to  correlate 
alerts  from  a  large  pool  of  heterogeneous  sensors.  Researchers  have  suggested  that  looking  at  entire 
intrusion  scenarios  can  help  in  linking  disparate  low-level  alerts  into  aggregate  super- alerts  [25,  91]. 
We  think  that  alert  aggregation  can  be  even  more  effective  with  full  attack  graphs. 

Defense 

A  further  benefit  of  attack  graphs  is  that  they  can  help  analyze  potential  effectiveness  of  many 
different  security  measures  offline.  In  Chapters  8  and  10  we  show  how  both  the  security  policy 
and  the  intrusion  detection  system  can  be  incorporated  explicitly  in  the  attack  model.  The  system 
administrator  can  then  perform  several  kinds  of  analysis  on  such  configuration-specific  attack  graphs 
to  assess  the  security  needs  of  the  network. 

Forensics 

After  a  break-in,  forensic  analysis  attempts  to  identify  the  actions  taken  by  the  attacker  and 
assess  the  damage.  If  legal  action  is  desired,  analysts  seek  evidence  that  a  sequence  of  sensor  alerts 
comprises  a  coherent  attack  plan,  and  is  not  merely  a  series  of  isolated,  benign  events.  This  task 
becomes  harder  when  the  intruders  obfuscate  attack  steps  by  slowing  down  the  pace  of  the  attack 
and  varying  specific  steps.  Researcher  have  suggested  that  matching  data  extracted  from  IDS  logs 
to  a  formal  reference  model  based  on  attack  graphs  can  strengthen  the  argument  that  the  intruder’s 
actions  were  malicious  [86]. 

Security  Policy  Considerations 

Experience  has  shown  that  organizations  arc  reluctant  to  change  their  security  policy  or  patch 
their  machines  even  in  the  face  of  evidence  that  their  network  is  vulnerable  to  attack.  Sometimes  it  is 
impractical  to  keep  all  of  the  machines  up  to  date — some  networks  contain  thousands  of  hosts,  and 
even  with  the  help  of  automation  applying  frequent  security  patches  is  a  challenge.  Organizations 
sometimes  keep  a  relatively  lax  security  policy  because  tightening  it  prevents  legitimate  users  from 
using  the  network  to  do  their  work.  For  example,  an  organization  may  choose  to  keep  an  FTP  server 
open,  or  give  its  employees  remote  access  to  the  network. 

Compiling  a  list  of  vulnerabilities  with  a  security  scanning  tool  may  not  be  enough  to  persuade 
the  organization  that  action  is  needed.  We  show  that  existing  vulnerability  scanners,  such  as  Nes- 
sus  [30]  can  be  connected  with  automated  attack  graph  generation  software.  Such  a  combination 
automatically  scans  a  network  and  generates  customized  attack  graphs.  An  attack  graph  showing 
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multiple  concrete  break-in  scenarios  that  could  be  executed  on  the  organization’s  network  could  be 
used  as  persuasive  evidence  that  the  organization’s  security  policies  arc  too  lax. 

Roadmap 

Although  our  study  of  attack  graphs  and  related  models  is  motivated  primarily  by  applications 
to  network  security,  in  the  interest  of  keeping  other  research  avenues  open  we  make  our  definitions 
as  broad  as  possible  and  make  them  applicable  to  any  context  where  active  agents  attack  and  defend 
a  complex  system.  In  Chapter  7  we  give  a  broad  definition  for  attack  models  and  attack  graphs. 
Chapter  8  narrows  the  definitions  specifically  to  the  domain  of  network  security.  In  Chapter  9  we 
consider  some  post-generation  analyses  that  can  be  done  on  attack  graphs  to  help  solve  real-world 
questions  about  network  security.  Chapter  10  illustrates  the  definitions  and  analysis  techniques  with 
a  small  example  network.  Chapter  1 1  focuses  on  the  practical  aspects  of  building  a  usable  attack 
graph  tool.  We  discuss  several  approaches  to  collecting  the  data  necessary  to  build  the  network 
model.  Finally,  we  review  related  work  in  Chapter  12  and  state  our  conclusions  in  Chapter  13. 


Chapter  7 


Attack  Graphs 


Abstractly,  an  attack  graph  is  a  collection  of  scenarios  showing  how  a  malicious  agent  can  compro¬ 
mise  the  integrity  of  a  target  system.  Such  graphs  arc  a  natural  application  of  the  scenario  graph 
formalism  defined  in  Chapter  2.  With  a  suitable  model  of  the  system,  we  can  use  the  techniques 
developed  in  Paid  I  to  generate  attack  graphs  automatically.  In  this  context,  correctness  properties 
specify  the  negation  of  the  attacker’s  goal:  an  execution  is  correct  with  respect  to  the  property  if  the 
attacker  does  not  achieve  his  goal  for  that  execution.  We  call  such  properties  security  properties. 


7.1  Attack  Models 


Although  our  primary  interest  is  in  multi-stage  cyber-attacks  against  computer  networks,  we  define 
the  attack  graph  formalism  abstractly  as  a  scenario  graph  for  a  model  where  agents  attack  and  defend 
a  complex  system.  An  attack  model  W  =  (S,  r,  {so};  S,  S,  D)  is  a  Biichi  model  representing  a  set 
of  three  agents  1  =  {E,  D,  S}.  Agent  E  is  an  attacker,  agent  I)  is  a  defender,  and  agent  S  is  the 
system  under  attack.  The  specifics  of  how  each  agent  is  represented  in  an  attack  model  depend  on 
the  type  of  the  system  that  is  under  attack;  in  Chapter  8  we  specify  the  agents  more  precisely  for 
network  attack  models. 

With  each  agent  i  £  T  we  associate  a  set  of  actions  A,  so  that  the  total  set  of  actions  in 
the  model  is  A  =  U*ex  A-  In  general,  the  attacker’s  actions  move  the  system  “toward”  some 
undesirable  (from  the  system's  point  of  view)  state,  and  the  defender’s  actions  attempt  to  counteract 
that  effect.  There  is  a  single  root  state  so  representing  the  initial  state  of  each  agent  before  any 
action  has  taken  place. 

An  attack  model  is  a  general  formalism  suitable  for  modeling  a  variety  of  situations.  The  system 
under  attack  can  be  virtually  anything:  a  computer  network  under  attack  by  hackers,  a  city  under 
siege  during  war,  an  electrical  grid  targeted  by  terrorists,  etc.  The  attacker  is  an  abstract  represen¬ 
tation  of  a  group  of  agents  that  seek  to  move  the  system  to  a  state  that  is  inimical  to  the  system’s 
interests.  The  defender  is  an  agent  whose  explicit  purpose,  in  opposition  to  the  attacker,  is  to  prevent 
this  occurrence.  The  system  itself  is  oblivious  to  the  fact  that  it  is  under  attack.  It  goes  through  its 
normal  routine  according  to  its  purpose  and  goals  regardless  of  the  actions  of  the  active  players. 
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7.2  Attack  Graphs 

Formally,  an  attack  graph  is  a  safety  scenario  graph  with  an  appropriately  defined  safety  property 
(Section  2.1.8): 

Definition  7.1  Given  a  security  property  P,  the  attack  graph  for  an  attack  model  VV  =  (S',  r,  {so},  .S', 
S ,  D)  is  a  safety  scenario  graph  for  W  with  respect  to  P. 

Chapter  8  applies  this  notion  of  attack  graphs  to  the  network  security  domain  and  shows  how  to  gen¬ 
erate  intruder  attack  graphs  automatically  using  scenario  graph  generation  techniques  from  Paid  I. 


Chapter  8 


Network  Attack  Graphs 


Network  attack  graphs  represent  a  collection  of  possible  penetration  scenarios  in  a  computer  net¬ 
work.  Each  penetration  scenario  is  a  sequence  of  steps  taken  by  the  intruder,  typically  culminating 
in  a  particular  goal — administrative  access  on  a  particular  host,  access  to  a  database,  service  dis¬ 
ruption,  etc.  For  appropriately  constructed  network  models,  attack  graphs  give  a  bird’s-eye  view 
of  every  scenario  that  can  lead  to  a  serious  security  breach.  Sections  8.1  and  8.2  show  how  attack 
models  can  be  constructed  for  the  network  security  domain. 


8.1  Network  Attack  Model 

A  network  attack  model  is  an  attack  model  where  the  system  S  is  a  computer  network,  the  attacker 
E  is  a  malicious  agent  trying  to  circumvent  the  network’s  security,  and  the  defender  I)  represents 
both  the  system  administrator(s)  and  security  software  installed  on  the  network.  A  state  transition 
in  a  network  attack  model  corresponds  to  a  single  action  by  the  intruder,  a  defensive  action  by  the 
system  administrator,  or  a  routine  network  action. 

Real  networks  consist  of  a  large  variety  of  hardware  and  software  pieces,  most  of  which  arc 
not  involved  in  cyber  attacks.  We  have  chosen  six  network  components  relevant  to  constructing 
network  attack  models.  The  components  were  chosen  to  include  enough  information  to  represent 
a  wide  variety  of  networks  and  attack  scenarios,  yet  keep  the  model  reasonably  simple  and  small. 
The  following  is  a  list  of  the  components: 

1 .  H,  a  set  of  hosts  connected  to  the  network 

2.  C,  a  connectivity  relation  expressing  the  network  topology  and  inter-host  reachability 

3.  T,  a  relation  expressing  trust  between  hosts 

4.  /,  a  model  of  the  intruder 

5.  A,  a  set  of  individual  actions  (exploits)  that  the  intruder  can  use  to  construct  attack  scenarios 

6.  Ids,  a  model  of  the  intrusion  detection  system 

We  construct  an  attack  model  W  based  on  these  components.  The  following  table  defines  each 
agent  V s  state  S)  and  action  set  A ,  in  terms  of  the  network  components: 
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Agent  i  £  T 

Si 

Ai 

E 

I 

A 

D 

Ids 

{alarm.} 

S 

H  xC  xTx  S 

0 

This  construction  gives  the  security  administrator  an  entirely  passive  “detection”  role,  embodied 
in  the  alarm  action  of  the  intrusion  detection  system.  For  simplicity,  regular  network  activity  is 
omitted  entirely. 

It  remains  to  make  explicit  the  transition  relation  of  the  attack  model  VV.  Each  transition 
( s  i ,  6'2 )  £  r  is  either  an  action  by  the  intruder,  or  an  alarm  action  by  the  system  administrator. 
An  alarm  action  happens  whenever  the  intrusion  detection  system  is  able  to  flag  an  intruder  action. 
An  action  action  a  £  A  requires  that  the  preconditions  of  a  hold  in  state  s  \  and  the  postconditions 
of  a  hold  in  S2- 


8.2  Network  Components 

The  rest  of  this  chapter  is  devoted  to  explaining  the  network  components  in  more  detail. 

Hosts 

Hosts  are  the  main  hubs  of  activity  on  a  network.  They  run  services,  process  network  requests, 
and  maintain  data.  With  rare  exceptions,  every  action  in  an  attack  scenario  will  target  a  host  in 
some  way.  Typically,  an  action  takes  advantage  of  vulnerable  or  mis-configured  software  to  gain 
information  or  access  privileges  for  the  attacker.  The  main  goal  in  modeling  hosts  is  to  capture 
as  much  information  as  possible  about  components  that  may  contribute  to  creating  an  exploitable 
vulnerability. 

A  host  h  £  H  is  a  tuple  (id,  sves,  sw,  vuls),  where 

•  id  is  a  unique  host  identifier  (typically,  name  and  network  address) 

•  sves  is  a  list  of  service  name/port  number  pairs  describing  each  service  that  is  active  on  the 
host  and  the  port  on  which  the  service  is  listening 

•  sw  is  a  list  of  other  software  operating  on  the  host,  including  the  operating  system  type  and 
version 

•  vuls  is  a  list  of  host-specific  vulnerable  components.  This  list  may  include  installed  softw  are 
with  exploitable  security  flaws  (example:  a  setuid  program  with  a  buffer  overflow  problem), 
or  mis-configured  environment  settings  (example:  existing  user  shell  for  system-only  users, 

SUCh  SLSftp) 


Network  Connectivity 

Following  Ritchey  and  Ammann  [79],  connectivity  is  expressed  as  a  ternary  relation  C  C  fi  x 
H  x  P,  where  P  is  a  set  of  integer  port  numbers.  C(h  \ .  h-2-  p)  means  that  host  ha  is  reachable  from 
host  h\  on  port  p.  Note  that  the  connectivity  relation  incorporates  firewalls  and  other  elements  that 
restrict  the  ability  of  one  host  to  connect  to  another.  Slightly  abusing  notation,  we  say  R(h\ ,  ha ) 
when  there  is  a  network  route  from  h\  to  h?. 
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Trust 

We  model  trust  as  a  binary  relation  T  C  H  x  H,  where  T(h\ ,  /12)  indicates  that  a  user  may  log  in 
from  host  /12  to  host  h\  without  authentication  (i.e.,  host  h\  “trusts”  host  hi). 

Services 

The  set  of  services  S  is  a  list  of  unique  service  names,  one  for  each  service  that  is  present  on  any 
host  on  the  network.  We  distinguish  services  from  other  software  because  network  services  so  often 
serve  as  a  conduit  for  exploits.  Furthermore,  services  are  tied  to  the  connectivity  relation  via  port 
numbers,  and  this  information  must  be  included  in  the  model  of  each  host.  Every  service  name  in 
each  host’s  list  of  services  comes  from  the  set  S. 

Intrusion  Detection  System 

We  associate  a  boolean  variable  with  each  action,  abstractly  representing  whether  or  not  the  IDS 
can  detect  that  particular  action.  Actions  are  classified  as  being  either  detectable  or  stealthy  with 
respect  to  the  IDS.  If  an  action  is  detectable,  it  will  trigger  an  alarm  when  executed  on  a  host  or 
network  segment  monitored  by  the  IDS;  if  an  action  is  stealthy,  the  IDS  does  not  see  it. 

We  specify  the  IDS  as  a  function  ids:  H  x  El  x  A  — r  {d,  s,  6},  where  ids( h  \ ,  I12  ■  a)  =  d  if  action 
a  is  detectable  when  executed  with  source  host  h\  and  target  host  h 2;  ids( h  \ ,  /12,  a)  =  s  if  action  a  is 
stealthy  when  executed  with  source  host  hi  and  target  host  I12'.  and  ids( h  \ .  /i2,  a)  =  b  if  action  a  has 
doth  detectable  and  stealthy  strains,  and  success  in  detecting  the  action  depends  on  which  strain  is 
used.  When  h\  and  I12  refer  to  the  same  host,  ids( h \ ,  I12 ,  a)  specifies  the  intrusion  detection  system 
component  (if  any)  located  on  that  host.  When  h\  and  /12  refer  to  different  hosts,  ids( h  \ ,  I12 ,  a) 
specifies  the  intrusion  detection  system  component  (if  any)  monitoring  the  network  path  between 
h\  and  /12. 


Actions 

Each  action  is  a  triple  (r,  hs ,  hf),  where  hs  €  II  is  the  host  from  which  the  action  is  launched, 
ht  £  H  is  the  host  targeted  by  the  action,  and  r  is  the  rule  that  describes  how  the  intruder  can 
change  the  network  or  add  to  his  knowledge  about  it.  A  specification  of  an  action  rule  has  four 
components:  intruder  preconditions,  network  preconditions,  intruder  effects,  and  network  effects. 
The  intruder  preconditions  component  places  conditions  on  the  intruder’s  store  of  knowledge  and 
the  privilege  level  required  to  launch  the  action.  The  network  preconditions  specifies  conditions 
on  target  host  state,  network  connectivity,  trust,  services,  and  vulnerabilities  that  must  hold  before 
launching  the  action.  Finally,  the  intruder  and  network  effects  components  list  the  action’s  effects 
on  the  intruder  and  on  the  network,  respectively. 

Intruder 

The  intruder  has  a  store  of  knowledge  about  the  target  network  and  its  users.  The  intruder’s  store 
of  knowledge  includes  host  addresses,  known  vulnerabilities,  user  passwords,  information  gathered 
with  port  scans,  etc.  Also  associated  with  the  intruder  is  the  function  plvl:  Hosts  — >  {none,  user, 
root},  which  gives  the  level  of  privilege  that  the  intruder  has  on  each  host.  For  simplicity,  we  model 
only  three  privilege  levels.  There  is  a  strict  total  order  on  the  privilege  levels:  none  <  user  <  root. 

For  bookkeeping,  three  state  variables  specify  the  most  recent  action  triple  used  by  the  intruder: 
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variable 

meaning 

action 

action  rule 

source 

source  host 

target 

target  host 

8.2.1  Omitted  Complications 

Although  we  do  not  model  actions  taken  by  user  services  for  the  sake  of  simplicity,  doing  so  in  the 
future  would  let  us  ask  questions  about  effects  of  intrusions  on  service  quality.  A  more  complex 
model  could  include  services  provided  by  the  network  to  its  regular  users  and  other  routine  network 
traffic.  These  details  would  reflect  more  realistically  the  interaction  between  intruder  actions  and 
regular  network  activity  at  the  expense  of  additional  complexity. 

Another  activity  worth  modeling  explicitly  is  administrative  steps  taken  either  to  hinder  an  at¬ 
tack  in  progress  or  to  repair  the  damage  after  an  attack  has  occurred.  The  former  corresponds  to 
transitioning  to  states  of  the  model  that  offer  less  opportunity  for  further  penetration;  the  latter 
means  “undoing”  some  of  the  damage  caused  by  successful  attacks. 


Chapter  9 


Analysis  of  Attack  Graphs 


Attack  graphs  provide  a  bird’s-eye  view  of  what  can  go  wrong.  We  can  use  them  to  define  a  rough 
measure  of  how  exposed  the  network  is  to  malicious  penetration.  The  candidate  metrics  for  network 
exposure  include  total  graph  size  and  the  number  of  successful  attack  scenarios.  Whichever  metric 
is  used,  there  arc  many  follow-up  questions  worth  exploring.  The  following  is  a  partial  list: 

•  Find  successful  attack  scenarios  undetected  by  the  intrusion  detection  system  or  unaffected 
by  contemplated  security  upgrades 

•  Determine  where  to  position  new  IDS  components  for  best  coverage,  or  where  to  target  secu¬ 
rity  upgrades 

•  Predict  effectiveness  of  various  security  policies  and  different  software  and  hardware  config¬ 
urations 

•  Predict  the  changes  in  overall  network  vulnerability  that  would  occur  if  new  exploits  of  a 
certain  type  became  available 

•  Identify  worst  case  scenarios  and  prioritize  defenses  accordingly 

•  Explore  trade-offs  between  various  approaches  to  defending  the  network 

•  Assess  the  effectiveness  of  “vigilant”  strategies  that  attempt  to  defend  the  system  dynamically, 
while  the  attack  is  in  progress 

Instead  of  trying  to  cover  all  the  bases,  we  will  present  example  analyses  that  focus  on  finding 
the  best  upgrade  strategy  when  existing  defenses  arc  deemed  inadequate.  For  our  purposes  it  is 
necessary  to  distinguish  between  two  different  defensive  strategies — static  and  dynamic  defense. 

Static  Defense  Analysis 

A  static  defense  measure  is  a  one-shot  change  in  the  network  configuration  that  reduces  network 
exposure.  Static  measures  include  imposing  a  stricter  password  policy,  applying  security  patches 
and  upgrades,  or  tightening  firewall  rules.  A  single  static  measure  corresponds  to  one  modification 
of  the  attack  graph:  it  reduces  the  number  of  scenarios  that  the  intruder  can  execute,  but  does 
not  present  any  additional  obstacles  to  the  attacker  while  the  attack  is  underway.  Generally,  it  is 
possible  to  perform  analyses  of  static  measures  directly  on  the  attack  graphs,  without  referencing 
the  underlying  attack  model. 
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When  deciding  on  a  static  security  upgrade,  the  system  administrator  may  have  a  number  of 
options  and  wish  to  evaluate  their  relative  effectiveness.  In  Section  9.1  we  show  how  to  determine 
a  minimal  set  of  atomic  attacks  that  must  be  prevented  to  guarantee  that  the  intruder  cannot  achieve 
his  goal  in  some  particular  attack  graph.  Appendix  A  discusses  an  alternative  approach  to  static 
analysis  that  uses  probabilistic  models. 

Dynamic  Defense  Analysis 

Dynamic  network  defense  measures  are  not  yet  common  in  modern  network  setups,  with  the 
notable  exception  of  intrusion  detection  systems.  A  dynamic  defensive  mechanism  continually 
monitors  the  network  and  attempts  actively  to  prevent  the  attacker  from  succeeding.  It  is  a  safe 
prediction  that  in  the  future  network  software  will  become  more  aware  of  its  own  state,  and  active 
defense  will  play  a  correspondingly  larger  role  in  network  security. 

In  the  presence  of  dynamic  defensive  components  an  attack  scenario  becomes  a  game  of  moves 
by  the  attacker  and  counter-moves  by  the  defender,  so  it  is  logical  to  employ  a  full  game-theoretic 
treatment  in  analyzing  effectiveness  of  dynamic  defense.  Appendix  A  contains  a  game-theoretic 
definition  of  attack  models  and  attack  graphs  and  discusses  analysis  of  attack  graphs  based  on  the 
game -theoretic  definitions. 


9.1  Static  Defense:  Graph-Theoretic  Exposure  Minimization 

Once  we  have  an  attack  graph  generated  for  a  specific  network  with  respect  to  a  given  security 
property,  we  can  attempt  to  determine  the  best  way  to  reduce  the  network  exposure  to  attack.  In  this 
section  we  look  at  ways  to  choose  the  best  security  upgrade  from  a  set  of  defensive  measures,  such 
as  deploying  additional  intrusion  detection  tools,  adding  firewalls,  upgrading  software,  deleting  user 
accounts,  changing  access  rights,  etc. 

We  say  that  a  measure  covers  an  action  if  it  renders  the  action  ineffective  for  the  intruder.  Previ¬ 
ous  work  [52,  80]  considers  the  question  of  what  measures  a  system  administrator  should  deploy  to 
make  the  system  completely  safe.  We  briefly  review  the  key  definitions  and  results  and  then  extend 
them. 

Assume  that  we  have  produced  an  attack  graph  G  corresponding  to  the  security  property 

P  =  G(~^unsafe) 

Since  P  is  a  safety  property,  we  know  that  G  is  a  safety  scenario  graph.  Let  A  be  the  set  of  actions, 
and  G  =  ( S ,  E ,  sq>  s/,  D)  be  the  attack  graph,  where  S  is  the  set  of  states,  E  C  S  x  S  is  the  set 
of  edges,  sq  £  S  is  the  initial  state,  sj  £  S  is  the  final  state,  and  D  :  E  — >  A  U  {e}  is  a  labeling 
function  where  D(e)  =  a  if  an  edge  e  =  (s  — ►  s')  corresponds  to  an  action  a,  otherwise  D(e)  =  e. 
Edges  labeled  with  e  represent  system  transitions  that  do  not  correspond  to  an  action. 

Without  loss  of  generality  we  assume  that  there  is  a  single  initial  state  and  a  single  final  state. 
It  is  easy  to  convert  an  attack  graph  with  multiple  initial  states  Sq,  ■  ■  • , Sq  and  final  states  .sj,  •  •  • ,  sj 
into  an  attack  graph  that  accepts  the  same  set  of  scenarios  but  has  a  single  initial  state  and  a  single 
final  state.  We  simply  add  a  new  initial  state  so  and  a  new  final  state  s j  with  e-labeled  edges  (so,  s™) 
(1  <  m  <  j )  and  (ss,  sj)  (1  <t<  u). 

Suppose  that  we  are  given  a  finite  set  of  measures  M  =  {mi,  •  •  • ,  m/,. }  and  a  function  covers  : 
M  — >  2a,  where  a  £  covers (rrii )  if  adopting  measure  m,  removes  the  action  a  from  the  intruder’s 
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arsenal.  We  want  to  find  either  a  subset  M'  C  M  or  a  subset  A'  C  .4  of  smallest  size,  such  that 
adopting  the  measures  in  the  set  M'  or  installing  defenses  against  all  attacks  in  A'  will  make  the 
network  safe.  These  two  problems  are  related.  To  state  them  formally,  we  need  a  few  definitions. 

A  path  7T  in  G  =  (S,  E,  sq,  Sf,  D)  is  sequence  of  states  s\,  -  ■  ■  ,sn,  such  that  s,  G  S  and 
(sj,  Sj+i)  G  E,  where  1  <  i  <  n.  A  complete  path  starts  from  the  initial  state  so  and  ends  in 
the  final  state  Sf.  The  label  of  a  path  ir  =  si,  ■  ■  ■ ,  sn  is  a  subset  of  the  set  of  actions  A.  Abusing 
notation,  we  will  denote  the  label  of  path  7r  as  D(ir). 

n —  1 

UfD<  sii  si+l )}  \  {e}  • 

i=l 

D(tt)  represents  the  set  of  actions  used  on  the  path  ir.  A  subset  of  actions  A'  C  ,4  is  called  realizable 
in  the  attack  graph  G  iff  there  exists  a  complete  path  tt  in  G  such  that  D(tt)  =  A! .  In  other  words, 
an  intruder  can  use  the  set  of  actions  A!  to  reach  the  final  state  from  the  initial  state.  We  denote  the 
set  of  all  realizable  sets  in  an  attack  graph  G  as  Rel(G). 

Given  a  state  s  G  S',  a  set  of  actions  C(s)  is  critical  with  respect  to  s  if  and  only  if  the  intruder 
cannot  reach  his  goal  from  s  when  the  actions  in  C(s)  are  removed  from  his  arsenal.  Equivalently, 
C(s)  is  critical  with  respect  to  s  if  and  only  if  every  path  from  s  to  the  final  state  s  j  has  at  least  one 
edge  labeled  with  an  action  a  G  C(s). 

A  critical  set  of  actions  C(s)  corresponding  to  a  state  s  is  minimum  if  there  is  no  critical  set 
C'(s)  such  that  \C'(s) \  <  |C(s)|.  In  general,  there  can  be  multiple  minimum  sets  corresponding 
to  a  state  s.  Of  course,  all  minimum  critical  sets  must  be  of  the  same  size.  We  say  that  a  critical 
set  of  actions  corresponding  to  the  initial  state  so  is  also  a  critical  set  of  actions  for  the  attack  graph 

G  =  (S,E,s0,sf,D). 

Similarly,  given  a  state  s  G  S',  a  set  of  measures  Mc(s)  is  critical  with  respect  to  s  if  and  only 
if  the  intruder  cannot  reach  his  goal  from  s  when  the  actions  in  (J meMc(s)  cover  s{m)  arc  removed 
from  his  arsenal.  A  critical  set  of  measures  Mc(s )  corresponding  to  a  state  s  is  minimum  if  there  is 
no  critical  set  M'c(s)  such  that  |M'(s)|  <  |Mc(s)|.  We  can  now  state  formally  two  problems: 

Definition  9.1  Minimum  Critical  Set  of  Actions  (MCSA) 

Given  an  attack  graph  G  =  (S,  E,  so,  ss,  D )  and  a  set  of  actions  A,  find  a  minimum  critical 
subset  of  actions  C(s o)  C  A  for  G. 


Definition  9.2  Minimum  Critical  Set  of  Measures  (MCSM) 

Given  an  attack  graph  G  =  (S,  E,  sq,ss,  D ),  a  set  of  actions  A,  and  a  set  of  defensive  measures 
M  C  V(A),find  a  minimum  critical  subset  of  measures  Mc(sq)  C  M. 

MCSA  is  ./VP-complete  [52].  There  is  a  trivial  reduction  from  MCSA  to  MCSM;  given  an  instance 
( G ,  A)  of  MCSA,  we  can  construct  an  instance  (G,  A,  M)  of  MCSM  where  M  =  {{a}  |  a  G  A}. 
This  reduction  shows  that  MCSM  is  also  ArP-complctc.  To  solve  these  problems  efficiently,  we 
employ  heuristics. 

MCSA  and  MCSM  fall  into  the  broad  class  of  convex  optimization  problems.  They  arc  closely 
related  to  two  other  members  of  that  class.  Minimum  Set  Cover  and  Minimum  Hitting  Set. 

Definition  9.3  Minimum  Set  Cover  (MSC) 

Given  a  collection  J  of  subsets  of  a  finite  set  C,  find  a  minimum  subcollection  J'  C  J  such 
that  (^J  ^ .  fz  i  S>i  —  C . 
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Input: 

C,  a  fi  nite  set 

J ,  a  collection  of  subsets  of  C 

Output: 

J'  C  J .  a  set  cover  of  C 
Algorithm  GREEDY-SET-COVER(C,  J) 

J'  ^0; 

C'  <-  C; 

do 

Pick  j  a  J\J'  that  maximizes  |  j  Pi  C'/ 1 ; 
U  {i}; 

C'  <-C'\j\ 
until  C'  is  empty; 

return  J'\ 

(a)  GREEDY-SET-COVER 


Input: 

G  =  (S,  E,  so,  Sf,  D),  an  attack  graph 
A,  an  action  set 

Output: 

A'  C  A,  a  critical  set  of  actions  in  G 

Algorithm  GREEDY-CRITICAL-SET(G,  A) 
A'  <-  0; 

G'  G; 

do 

Pick  a  £  A  \  A  that  maximizes  /; lq /  (a); 
A'  <-  A'  U  {a}; 

Remove  all  edges  labeled  with  a  from  G'\ 
until  G'  is  empty; 

return  A'; 


(b) GREEDY-CRITICAL-SET 


Figure  9.1:  Greedy  Approximation  Algorithms 


Definition  9.4  Minimum  Hitting  Set  (MHS) 

Given  a  collection  J  of  subsets  of  a  finite  set  C,  find,  a  minimum  subset  H  C  C  that  has  a 
nonempty  intersection  with  every  member  of  J. 

It  has  been  shown  that  MSC  and  MHS  are  structurally  equivalent  [6]. 

Jha,  Sheyner,  and  Wing  show  MCSA  to  be  structurally  equivalent  to  MHS  and,  by  transitivity, 
to  MSC  [52],  An  approximate  solution  to  MCSA  is  found  using  a  variant  of  the  classic  greedy  MSC 
approximation  algorithm  [22],  shown  in  Figure  9.1(a).  The  corresponding  MCSA  approximation 
algorithm  GREEDY-CRITICAL-SET  is  specified  in  Figure  9.1(b).  It  finds  a  critical  set  of  actions 
for  an  attack  graph  G  =  ( S ,  E,  sq,  Sf,  D ).  GREEDY-CRITICAL-SET  makes  use  of  the  function 
Pg  :  A  — >  N,  where  Hc(a)  is  the  number  of  realizable  sets  in  the  attack  graph  G  that  contain  the 
action  a.  Formally, 


pG{a)  =  \{A‘\a  €  A'  and  A'  e  Rel(G)} \  . 

By  structural  equivalence,  the  approximation  ratio  Rgcs  for  GREEDY-CRITICAL-SET  is  identical 
to  the  ratio  for  the  greedy  set  cover  algorithm: 

Rgcs  =  In  n  —  In  In  n  +  0(1) 


where  n  =  \Rel(G)\  [84], 

To  solve  MCSM,  we  use  measures  instead  of  individual  actions  in  another  variant  of  the  same 
greedy  algorithm.  To  justify  this  approach,  we  show  via  a  sequence  of  reductions  that  MCSM  is 
structurally  equivalent  to  MSC.  Throughout  the  rest  of  this  section,  structure -preserving  reduction 
from  problem  A  to  problem  B  is  denoted  A  <s  B. 

Definition  9.5  Minimum  Hitting  Cover  ( MHC ) 

Given  two  collections  J .  K,  of  subsets  of  a  finite  set  C,  find  a  minimum  subcollection  K.  C.  JC' 
such  that  Vj  G  J ,  we  have 
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Input: 

G  —  (S,  E,  so,  Sf,  D),  an  attack  graph 
A,  an  action  set 

M  C  V(A),  a  set  of  security  measures 

Output: 

M'  C  M,  a  critical  set  of  measures  in  G 

Algorithm  GREEDY-MEASURE-SET(G,  M) 

M'  <-  0; 

G'  —  G; 

do 

Pick  m  £  M  \  M'  that  maximizes  vq*  (o); 
M'  A/'U{m}; 

Remove  all  edges  labeled 

with  any  attack  in  m  from  G'\ 
until  G'  is  empty; 
return  M'\ 

(b) GREEDY-MEASURE-SET 

Figure  9.2:  Greedy  Measure  Set 


IJ  k  n  j  +  0. 

k£K 


Definition  9.6  Minimum  Powerset  Cover  (MPC) 

Let  C  be  a  finite  set  and  T  C  V(V(C))  a  collection  of  sets  of  subsets  of  C.  Find  a  minimum 
subcollection  T'CT  such  that 


U  U  SltK,S,=C. 

Me  T' 


Below  we  prove  that  MCSM  <5  MHC  <5  MPC  <5  MSC.  The  relationship  between  MCSM  and 
MSC  lets  us  use  a  variant  of  GREEDY- SET-COVER  for  approximating  MCSM.  The  algorithm 
GREEDY-MEASURE-SET  is  shown  in  Figure  9.2.  It  finds  a  set  of  measures  that  cuts  attack  graph 
G  =  (5,  E.  so,  Sf,  D).  The  algorithm  uses  the  function  uq  :  Af  — >  R,  where  uc(rn)  is  the  number 
of  realizable  sets  in  the  attack  graph  G  that  contain  some  attack  from  measure  nr. 

vcim)  =  l-fyl'lm  fli'/O  and  A'  £  Rel(G)} \  . 

By  structural  equivalence,  the  approximation  ratio  Rgrns  for  GREEDY-MEASURE-SET  is  identical 
to  the  ratio  for  the  greedy  set  cover  algorithm: 

Rgms  =  In  n  —  In  In  n  +  0(1) 

where  n  =  \Rel(G)\  [84].  It  remains  to  prove  the  reductions  MCSM  <5  MHC  <5  MPC  <5  MSC. 


Theorem  9.1  MCSM  <s  MHC. 
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Proof: 

Let  (G,  A,  M )  be  an  instance  of  Minimum  Critical  Set  of  Measures.  The  goal  of  MCSM  is  to 
select  a  subcollection  of  measures  that  together  hit  every  realizable  set  of  attacks  in  G  at  least  once. 
We  construct  an  instance  (C,  J ■  1C)  of  Minimum  Hitting  Cover  with 


C  =  A 


J  =  Rel(G) 
1C  =  M 


Every  feasible  solution  1C  C  /C  of  Minimum  Hitting  Cover  will  ensure  that  every  set  in  J  is  hit 
at  least  once.  Equivalently,  in  the  MCSM  instance  the  same  solution  will  ensure  that  every  set  in 
Rel(G)  is  hit  by  at  least  one  measure,  as  desired. 

□ 


Theorem  9.2  MHC  <s  MPC. 

Proof: 

MHC  and  MPC  arc  duals  of  each  other,  and  it  is  possible  to  turn  one  into  the  other  with  a 
simple  substitution.  Given  an  instance  (C,  3 , 1C)  of  MHC,  let  /  be  a  function  that  assigns  a  name 
from  the  name  set  Nm  =  {j \ . . . .  ,jn }  (n  =  \J\)  to  every  set  in  the  collection  J,  so  that  J  = 

{/_1(i l),  •  •  •  ,/_10n)}. 

We  construct  an  instance  of  MPC  by  substituting  for  each  element  c  E  C  the  subset  of  names 
Nm  £  Nm  such  that  each  element  named  in  Nm  contains  c.  That  is,  Vj  £  Nm',  we  have  c  £  /-1(j). 

This  substitution  changes  1C  into  a  set  T  of  sets  of  names  from  Nm,  and  the  task  of  selecting  a 
minimum  subcollection  of  1C  that  hits  every  member  of  collection  J  turns  into  the  task  of  covering 
the  set  of  names  Nm  with  elements  of  T.  In  other  words,  (Nm,  T)  is  an  instance  of  MPC.  A  solution 
to  the  MPC  (Nm,  T)  also  gives  the  (dual)  solution  to  the  corresponding  MHC  (C,  J ,  1C). 

a 


Theorem  9.3  MPC  <s  MSC. 

Proof: 

This  reduction  is  simple.  Let  (C,  T)  be  an  instance  of  Minimum  Powerset  Cover.  The  goal  is  to 
pick  a  number  of  items  from  T  that  together  cover  C.  Each  item  v  £  T  is  a  set  of  subsets  of  C.  But 
v  covers  exactly  the  same  portion  of  C  as  its  “flattened”  version,  (Jjev  j-  So  we  can  construct  an 
instance  of  Minimum  Set  Cover  (C ,  J)  by  carrying  C  unchanged  and  flattening  every  item  in  T : 

C  =  C 

J  =  { 

j&v 


A  solution  of  MSC  for  (C' ,  J)  is  thus  also  an  MPC  solution  for  (C,  T). 


n 
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Jha,  Sheyner,  and  Wing  have  suggested  a  different,  two-step  approach  for  finding  approximate 
solutions  to  MCSM  [52].  In  the  first  step,  they  find  a  critical  set  of  actions  A'  using  the  approxima¬ 
tion  algorithm  for  MCSA.  In  the  second  step,  the  MSC  approximation  algorithm  in  Figure  9.1(a) 
takes  the  set  of  actions  A'  computed  in  the  first  step  and  covers  them  with  measures.  This  ap¬ 
proach  does  not  offer  any  guarantee  that  the  resulting  set  of  measures  is  a  good  approximation  to 
the  MCSM.  While  the  set  of  actions  A'  is  a  good  approximation  to  the  minimum  critical  set  of 
actions,  it  may  not  be  the  right  set  of  actions  to  cover  with  the  available  measures.  In  fact,  it  is 
possible  that  the  entire  set  of  available  measures  cannot  cover  A' ,  but  can  nevertheless  cover  some 
other  critical  set  of  actions.  Further,  even  when  some  subset  of  measures  Mc  C  M  can  cover  A! ,  it 
is  possible  that  a  smaller  subset  of  measures  M'c  C  M  can  cover  some  other,  perhaps  much  larger, 
critical  set  of  actions. 
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Chapter  10 


Example  Network 


Figure  10.1  shows  an  example  network.  There  are  two  target  hosts,  Windows  and  Linux,  on 
an  internal  company  network,  and  a  Web  server  on  an  isolated  “demilitarized  zone”  (DMZ)  net¬ 
work.  One  firewall  separates  the  internal  network  from  the  DMZ  and  another  firewall  separates  the 
DMZ  from  the  rest  of  the  Internet.  An  intrusion  detection  system  (IDS)  watches  the  network  traffic 
between  the  internal  network  and  the  outside  world. 

The  Linux  host  on  the  internal  network  is  running  several  services — Linux  “I  Seek  You” 
(. LICQ )  chat  software,  Squid  web  proxy,  and  a  Database.  The  LICQ  client  lets  Linux  users  ex¬ 
change  text  messages  over  the  Internet.  The  Squid  web  proxy  is  a  caching  server.  It  stores  requested 
Internet  objects  on  a  system  closer  to  the  requesting  site  than  to  the  source.  Web  browsers  can 
then  use  the  local  Squid  cache  as  a  proxy,  reducing  access  time  as  well  as  bandwidth  consumption. 
The  host  inside  the  DMZ  is  running  Microsoft’s  Internet  Information  Services  (IIS)  on  a  Windows 
platform. 

The  intruder  launches  his  attack  starting  from  a  single  computer,  which  lies  on  the  outside 
network.  To  be  concrete,  let  us  assume  that  his  eventual  goal  is  to  disrupt  the  functioning  of  the 
database.  To  achieve  this  goal,  the  intruder  needs  root  access  on  the  database  host  Linux.  The  five 
actions  at  his  disposal  are  summarized  in  the  following  table: 


Intruder 


5  LICQ 
^  Squid 


firewall 


firewall 


Linux 


9  Database 


Figure  10.1:  Example  Network 
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Action 

Effect 

Example  CVE  ID 

IIS  buffer  overflow 

remotely  get  root 

CAN-2002-0364 

Squid  port  scan 

port  scan 

CVE- 2001-1030 

LICQ  gain  user 

gain  user  privileges  remotely 

CVE-200 1-0439 

scripting  exploit 

gain  user  privileges  remotely 

CAN-2002-0193 

local  buffer  overflow 

locally  get  root 

CVE-2002-0004 

Each  of  the  five  actions  corresponds  to  a  real-world  vulnerability  and  has  an  entry  in  the  Common 
Vulnerabilities  and  Exposures  (CVE)  database.  CVE  [96]  is  a  standard  list  of  names  for  vulnera¬ 
bilities  and  other  information  security  exposures.  A  CVE  identifier  is  an  eight-digit  string  prefixed 
with  the  letters  “CVE”  (for  accepted  vulnerabilities)  or  “CAN”  (for  candidate  vulnerabilities). 

The  IIS  buffer  overflow  action  exploits  a  buffer  overflow  vulnerability  in  the  Microsoft  IIS  Web 
Server  to  gain  administrative  privileges  remotely. 

The  Squid  action  lets  the  attacker  scan  network  ports  on  machines  that  would  otherwise  be 
inaccessible  to  him,  taking  advantage  of  a  mis-configured  access  control  list  in  the  Squid  web  proxy. 

The  LICQ  action  exploits  a  problem  in  the  URL  parsing  function  of  the  LICQ  software  for  Unix- 
flavor  systems.  An  attacker  can  send  a  specially-crafted  URL  to  the  LICQ  client  to  execute  arbitrary 
commands  on  the  client’s  computer,  with  the  same  access  privileges  as  the  user  of  the  LICQ  client. 

The  scripting  action  lets  the  intruder  gain  user  privileges  on  Windows  machines.  Microsoft 
Internet  Explorer  5.01  and  6.0  allow  remote  attackers  to  execute  arbitrary  code  via  malformed 
Content-Disposition  and  Content-Type  header  fields  that  cause  the  application  for  the  spoofed  file 
type  to  pass  the  file  back  to  the  operating  system  for  handling  rather  than  raise  an  error  message. 
This  vulnerability  may  also  be  exploited  through  HTML  formatted  email.  The  action  requires  some 
social  engineering  to  entice  a  user  to  visit  a  specially-formatted  Web  page.  However,  the  action  can 
work  against  firewalled  networks,  since  it  requires  only  that  internal  users  be  able  to  browse  the 
Web  through  the  firewall. 

Finally,  the  local  buffer  overflow  action  can  exploit  a  multitude  of  existing  vulnerabilities  to  let 
a  user  without  administrative  privileges  gain  them  illegitimately.  For  the  CVE  number  referenced 
in  the  table,  the  action  exploits  buffer  overflow  in  the  at  program.  The  at  program  is  a  Linux  utility 
for  queuing  shell  commands  for  later  execution. 

Some  of  the  actions  that  we  model  have  multiple  instantiations  in  the  CVE  database.  For  ex¬ 
ample,  the  local  buffer  overflow  action  exploits  a  common  coding  error  that  occurs  in  many  Linux 
programs.  Each  program  vulnerable  to  local  buffer  overflow  has  a  separate  CVE  entry,  and  all  such 
entries  correspond  to  the  same  action  rule.  The  table  lists  only  one  example  CVE  identifier  for  each 
rule. 


10.1  Example  Network 

Target  Network 

We  specify  the  state  of  the  network  to  include  services  running  on  each  host,  existing  vulnerabilities, 
and  connectivity  between  hosts.  There  arc  five  boolean  variables  for  each  host,  specifying  whether 
any  of  the  three  services  arc  running  and  whether  either  of  two  other  vulnerabilities  arc  present  on 
that  host: 
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variable 

meaning 

w3svc/j 

IIS  web  service  running  on  host  h 

squid  h 

Squid  proxy  running  on  host  h 

licqh 

LICQ  running  on  host  h 

scriptings 

HTML  scripting  is  enabled  on  host  h 

vul-ats 

at  executable  vulnerable  to  overflow  on  host  h 

The  model  of  the  target  network  includes  connectivity  information  among  the  four  hosts.  The  initial 
value  of  the  connectivity  relation  R  is  shown  the  following  table.  An  entry  in  the  table  corresponds 
to  a  pair  of  hosts  (hi,  h/i).  IIS  and  Squid  listen  on  port  80  and  the  LICQ  client  listens  on  port  5190, 
and  the  connectivity  relation  specifies  which  of  these  services  can  be  reached  remotely  from  other 
hosts.  Each  entry  consists  of  three  boolean  values.  The  first  value  is  ‘y’  if  h\  and  h-2  arc  connected 
by  a  physical  link,  the  second  value  is  ‘y’  if  h\  can  connect  to  h-2  on  port  80,  and  the  third  value  is 
‘y’  if  h\  can  connect  to  h-2  on  port  5190. 


Host 

Intruder 

IIS  Web  Server 

Windows 

Linux 

Intruder 

y>y>y 

YAT 

n,n,n 

n,n,n 

IIS  Web  Server 

y,n,n 

y.y.y 

y.y.y 

y.y.y 

Windows 

n,n,n 

y.y.“ 

y.y.y 

y.y.y 

Linux 

n,n,n 

y.y.“ 

y.y.y 

y.y.y 

We  use  the  connectivity  relation  to  reflect  the  settings  of  the  firewall  as  well  as  the  existence  of 
physical  links.  In  the  example,  the  intruder  machine  initially  can  reach  only  the  Web  server  on  port 
80  due  to  a  strict  security  policy  on  the  external  firewall.  The  internal  firewall  is  initially  used  to 
restrict  internal  user  activity  by  disallowing  most  outgoing  connections.  An  important  exception  is 
that  internal  users  arc  permitted  to  contact  the  Web  server  on  port  80. 

In  this  example  the  connectivity  relation  stays  unchanged  throughout  an  attack.  In  general,  the 
connectivity  relation  can  change  as  a  result  of  intruder  actions.  For  example,  an  action  may  enable 
the  intruder  to  compromise  a  firewall  host  and  relax  the  firewall  rules. 

Intrusion  Detection  System 

A  single  network-based  intrusion  detection  system  protects  the  internal  network.  The  paths 
between  hosts  Intruder  and  Web  and  between  Windows  and  Linux  are  not  monitored;  the 
IDS  can  see  the  traffic  between  any  other  pair  of  hosts.  There  are  no  host-based  intrusion  detection 
components.  The  IDS  always  detects  the  LICQ  action,  but  cannot  see  any  of  the  other  actions.  The 
IDS  is  represented  with  a  two-dimensional  array  of  bits,  shown  in  the  following  table.  An  entry  in 
the  table  corresponds  to  a  pair  of  hosts  (hi,  h‘2).  The  value  is  ‘y’  if  the  path  between  h\  and  h 2  is 
monitored  by  the  IDS,  and  ’n’  otherwise. 


Host 

Intruder 

IIS  Web  Server 

Windows 

Linux 

Intruder 

n 

n 

y 

y 

IIS  Web  Server 

n 

n 

y 

y 

Windows 

y 

y 

n 

n 

Linux 

y 

y 

n 

n 

The  Intruder 
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The  intruder’s  store  of  knowledge  consists  of  a  single  boolean  variable  ’scan’.  The  variable 
indicates  whether  the  intruder  has  successfully  performed  a  port  scan  on  the  target  network.  For 
simplicity,  we  do  not  keep  track  of  specific  information  gathered  by  the  scan.  It  would  not  be 
difficult  to  do  so,  at  the  cost  of  increasing  the  size  of  the  state  space. 

Initially,  the  intruder  has  root  access  on  his  own  machine  Intruder,  but  no  access  to  the  other 
hosts.  The  ’scan’  variable  is  set  to  false. 

Action  Rules 

There  arc  five  action  rules  corresponding  to  the  five  actions  in  the  intruder’s  arsenal.  Throughout 
the  description,  S  is  used  to  designate  the  source  host  and  T  the  target  host.  R(S.  T,p)  says  that 
host  T  is  reachable  from  host  S  on  port  p.  The  abbreviation  plvl(X)  refers  to  the  intruder’s  current 
privilege  level  on  host  X. 

Recall  that  a  specification  of  an  action  rule  has  four  components:  intruder  preconditions,  net¬ 
work  preconditions,  intruder  effects,  and  network  effects.  The  intruder  preconditions  component 
places  conditions  on  the  intruder’s  store  of  knowledge  and  the  privilege  level  required  to  launch 
the  action.  The  network  preconditions  component  specifies  conditions  on  target  host  state,  network 
connectivity,  trust,  services,  and  vulnerabilities  that  must  hold  before  launching  the  action.  Finally, 
the  intruder  and  network  effects  components  list  the  effects  of  the  action  on  the  intruder’s  state  and 
on  the  network,  respectively. 

IIS  Buffer  Overflow 

This  remote-to-root  action  immediately  gives  a  remote  user  a  root  shell  on  the  target  machine. 


action  IlS-buffer-overflow  is 
intruder  preconditions 
plvl(S)  >  user 
plvl{T )  <  root 
network  preconditions 
w3svc  T 
R{S,  T,  80) 
intruder  effects 
plvl(T)  :=  root 
network  effects 
-iw3svcj’ 

end 


User-level  privileges  on  host  S 
No  root-level  privileges  on  host  T 

Host  T  is  running  vulnerable  IIS  server 
Host  T  is  reachable  from  S  on  port  80 

Root-level  privileges  on  host  T 

Host  T  is  not  running  IIS 


Squid  Port  Scan 

The  Squid  port  scan  action  uses  a  mis-configured  Squid  web  proxy  to  conduct  a  port  scan  of  neigh¬ 
boring  machines  and  report  the  results  to  the  intruder. 


action  squid-port-scan  is 

intruder  preconditions 
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plvl(S)  =  user 
-■scan 

network  preconditions 

squidy 
R(S,  T,  80) 

intruder  effects 

scan 

network  effects 

0 

end 


User-level  privileges  on  host  S 
We  have  not  yet  performed  a  port  scan 

Host  T  is  running  vulnerable  Squid  proxy 
Host  T  is  reachable  from  S  on  port  80 

We  have  performed  a  port  scan  on  the  network 

No  changes  to  the  network  component 


LICQ  Remote  to  User 

This  remote-to-user  action  immediately  gives  a  remote  user  a  user  shell  on  the  target  machine. 
The  action  rule  assumes  that  a  port  scan  has  been  performed  previously,  modeling  the  fact  that 
such  actions  typically  become  apparent  to  the  intruder  only  after  a  scan  reveals  the  possibility  of 
exploiting  software  listening  on  lesser-known  ports. 


action  LICQ-remote-to-user  is 
intruder  preconditions 

plvl(S)  >  user 
plvl(T )  =  none 
scan 

network  preconditions 

licq-p 

R(S ,  T,  5190) 

intruder  effects 

plvl(T )  :=  user 

network  effects 

0 


User-level  privileges  on  host  S 

No  user-level  privileges  on  host  T 

We  have  performed  a  port  scan  on  the  network 

Host  T  is  running  vulnerable  LICQ  software 
Host  T  is  reachable  from  S  on  port  5190 

User-level  privileges  on  host  T 

No  changes  to  the  network  component 


Scripting  Action 

This  remote-to-user  action  immediately  gives  a  remote  user  a  user  shell  on  the  target  machine.  The 
action  rule  does  not  model  the  social  engineering  required  to  get  a  user  to  download  a  specially- 
created  Web  page. 


action  client-scripting  is 
intruder  preconditions 
plvl(S)  >  user 
plvl{T )  =  none 
network  preconditions 
scripting'/ 


User-level  privileges  on  host  S 
No  user-level  privileges  on  host  T 

HTML  scripting  is  enabled  on  host  T 
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Begin 


Figure  10.2:  Example  Attack  Graph 


R(T,  S,  80)  Host  S  is  reachable  from  T  on  port  80 

intruder  effects 

plvl[T )  :=  user  User-level  privileges  on  host  T 

network  effects 

0  No  changes  to  the  network  component 

end 


Local  Buffer  Overflow 

If  the  intruder  has  acquired  a  user  shell  on  the  target  machine,  this  action  exploits  a  buffer  overflow 
vulnerability  on  a  setuid  root  file  (in  this  case,  the  at  executable)  to  gain  root  access. 


action  local-setuid-buffer-overflow  is 

intruder  preconditions 

plvl(T )  =  user  User-level  privileges  on  host  T 

network  preconditions 

vul-aty  There  is  a  vulnerable  at  executable 

intruder  effects 

plvl(T )  :=  root  Root-level  privileges  on  host  T 

network  effects 


end 


No  changes  to  the  network  component 
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10.2  Sample  Attack  Graphs 

We  have  implemented  an  attack  graph  toolkit  using  scenario  graph  algorithms  in  Paid  I  and  network 
attack  models  defined  in  this  chapter.  We  describe  the  toolkit  in  Chapter  11.  Figure  10.2  shows  a 
screenshot  of  the  attack  graph  generated  with  the  toolkit  for  the  security  property 

G(intruder.privilege[lin\  <  root ) 

which  states  that  the  intruder  will  never  attain  root  privileges  on  the  Linux  host.  In  Figure  10.2, 
a  sample  attack  scenario  is  highlighted  with  solid  square  nodes,  with  each  attack  step  identified 
by  name  and  CVE  number.  Since  the  external  firewall  restricts  most  network  connections  from  the 
outside,  the  intruder  has  no  choice  with  respect  to  the  initial  step — it  must  be  a  buffer  overflow  action 
on  the  IIS  Web  server.  Once  the  intruder  has  access  to  the  Web  server  machine,  his  options  expand. 
The  highlighted  scenario  is  the  shortest  route  to  success.  The  intruder  uses  the  Web  server  machine 
to  launch  a  port  scan  via  the  vulnerable  Squid  proxy  running  on  the  Linux  host.  The  scan  discovers 
that  it  is  possible  to  obtain  user  privileges  on  the  Linux  host  with  the  LICQ  exploit.  After  that,  a 
simple  local  buffer  overflow  gives  the  intruder  administrative  control  over  the  Linux  machine.  The 
last  transition  in  the  action  path  is  a  bookkeeping  step,  signifying  the  intruder’s  success. 

Any  information  explicitly  represented  in  the  model  is  available  for  inspection  and  analysis  in 
the  attack  graph.  Lor  instance,  with  a  few  clicks  we  arc  able  to  highlight  portions  of  the  graph 
“covered”  by  the  intrusion  detection  system.  Ligure  10.3  shades  the  nodes  where  the  IDS  alarm  has 
been  sounded.  These  nodes  lie  on  paths  that  use  the  LICQ  action  along  a  network  path  monitored  by 
the  IDS.  It  is  clear  that  while  a  substantial  portion  of  the  graph  is  covered  by  the  IDS,  the  intruder 
can  escape  detection  and  still  succeed  by  taking  one  of  the  paths  on  the  right  side  of  the  graph. 
Once  such  attack  scenario  is  highlighted  with  square  nodes  in  Ligure  10.3.  It  is  very  similar  to  the 
attack  scenario  discussed  in  the  previous  paragraph,  except  that  the  LICQ  action  is  launched  from 
the  internal  Windows  machine,  where  the  intrusion  detection  system  does  not  see  it.  To  prepare  for 
launching  the  LICQ  action  from  the  Windows  machine,  an  additional  step  is  needed  to  obtain  user 
privileges  in  the  machine.  Lor  that,  the  intruder  uses  the  client  scripting  exploit  on  the  Windows 
host  immediately  after  taking  over  the  Web  machine. 

10.2.1  Sample  Attack  Graph  Analysis 

To  demonstrate  some  of  the  analysis  techniques  discussed  in  Chapter  9,  we  expanded  the  example 
from  Section  10.1  with  an  extra  host  User  on  the  external  network  and  several  new  actions.  An 
authorized  user  W  of  the  internal  network  owns  the  new  host  and  uses  it  as  a  terminal  to  work 
remotely  on  the  internal  Windows  host.  The  new  actions  permit  the  intruder  to  take  over  the  host 
User,  sniff  user  W's  login  credentials,  and  log  in  to  the  internal  Windows  host  using  the  stolen 
credentials.  We  omit  the  details  of  the  new  actions,  as  they  are  not  essential  to  understanding  the 
examples.  Ligure  10.4(a)  shows  the  full  graph  for  the  modified  example.  The  graph  is  significantly 
larger,  reflecting  the  expanded  number  of  choices  available  to  the  intruder. 


Single  Action  Removal 

A  simple  kind  of  analysis  determines  the  impact  of  removing  one  action  from  the  intruder’s  arsenal. 
Recall  from  Chapter  8  that  each  action  is  a  triple  (r,  hs,  hi),  where  hs  G  H  is  the  host  from  which 
the  attack  is  launched,  ht  £  H  is  the  host  targeted  by  the  attack,  and  r  is  an  action  rule.  The  user 
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Begin 


Figure  10.3:  Alternative  Attack  Scenario  Avoiding  the  IDS 


specifies  a  set  Arem  of  action  triples  to  be  removed  from  the  attack  graph.  The  toolkit  removes  the 
transitions  corresponding  to  each  triple  in  the  set  Arem  and  then  reconstructs  the  attack  graph  using 
the  regular  explicit  state  algorithm  GenerateScenarioAutomaton. 

As  demonstrated  in  Figure  10.4,  this  procedure  can  be  repeated  several  times,  reducing  the  size 
of  the  attack  graph  at  each  step.  The  full  graph  in  Figure  10.4(a)  has  362  states.  Removing  one  of 
two  ways  the  intruder  can  sniff  user  W’ s  login  credentials  produces  the  graph  in  Figure  10.4(b),  with 
213  states.  Removing  one  of  the  local  buffer  overflow  actions  produces  the  graph  in  Figure  10.4(c), 
with  66  states.  At  each  step,  the  user  is  able  to  judge  visually  the  impact  of  removing  a  single  action 
from  the  intruder’s  arsenal. 

Critical  Attack  Sets 

Recall  from  Section  9.1  that  once  an  attack  graph  is  generated,  the  GREEDY-CRITICAL-SET 
algorithm  can  find  an  approximately-optimal  set  of  actions  that  will  completely  disconnect  the  ini¬ 
tial  and  final  states  of  the  graph.  Similarly,  the  GREEDY-MEASURE-SET  algorithm  can  find  an 
approximately-optimal  set  of  security  measures  that  accomplish  the  same  goal.  With  a  single  click, 
the  user  can  invoke  both  of  these  exposure  minimization  algorithms. 

The  effect  of  GREEDY-CRITICAL-SET  on  the  modified  example  attack  graph  is  shown  in 
Figure  10.5(a).  The  algorithm  finds  a  critical  action  set  of  size  1,  containing  the  port  scan  action 
exploiting  the  Squid  web  proxy.  The  graph  nodes  and  edges  corresponding  to  actions  in  the  critical 
set  computed  by  the  algorithm  are  highlighted  in  the  toolkit  by  shading  the  relevant  nodes.  The 
shaded  nodes  are  seen  clearly  when  we  zoom  in  to  inspect  a  part  of  the  graph  on  a  larger  scale 
(Figure  10.5(b)). 

Since  the  computed  action  set  is  always  critical,  removing  every  action  triple  in  the  set  from  the 
intruder’s  arsenal  is  guaranteed  to  result  in  an  empty  attack  graph.  In  the  example,  we  might  patch 
the  Linux  machine  with  a  new  version  of  the  Squid  proxy,  thereby  removing  every  action  triple 
that  uses  the  Squid  port  scan  rule  on  the  Linux  machine  from  the  intruder’s  arsenal. 


(a) 


(b) 


(c) 


Figure  10.4:  Reducing  Action  Arsenal 


Figure  10.5:  Finding  Critical  Action  Sets 


Chapter  11 


Attack  Graph  Toolkit 


We  have  implemented  a  toolkit  for  generating  and  exploring  attack  graphs,  using  scenario  graph 
algorithms  in  Paid  I  and  network  attack  models  defined  Chapter  8.  In  this  chapter  we  describe  the 
toolkit  and  show  several  ways  to  integrate  it  with  external  data  sources  that  supply  information 
necessary  to  build  a  network  attack  model.  Specifically,  it  is  necessary  to  know  the  topology  of  the 
target  network,  configuration  of  the  network  hosts,  and  vulnerabilities  present  on  the  network.  In 
addition,  we  require  access  to  a  database  of  attack  rules  to  build  the  transition  relation  of  the  attack 
model.  We  could  expect  the  user  to  specify  all  of  the  necessary  information  manually,  but  such  a 
task  is  tedious,  error-prone,  and  unrealistic  for  networks  of  more  than  a  few  nodes. 

We  recommend  deploying  the  attack  graph  toolkit  in  conjunction  with  information-gathering 
systems  that  supply  some  of  the  data  automatically.  We  integrated  the  attack  graph  generator  with 
two  such  systems,  MITRE  Corp's  Outpost  and  Lockheed  Martin’s  ANGI.  We  report  on  our  experi¬ 
ence  with  Outpost  and  ANGI  in  Sections  1 1.4  and  1 1.5. 


11.1  Toolkit  Architecture 

Figure  11.1  shows  the  architecture  of  the  attack  graph  toolkit.  There  are  three  main  pieces:  a 
network  model  builder,  a  scenario  graph  generator,  and  a  graphical  user  interface  (GUI).  The 
network  model  builder  takes  as  input  information  about  network  topology,  configuration  data  for 
each  networked  host,  and  a  library  of  attack  rules.  It  constructs  a  finite  model  of  the  network 
suitable  for  automated  analysis.  The  model  is  augmented  with  a  security  specification,  which  spells 
out  the  security  requirements  against  which  the  attack  graph  is  to  be  built.  The  model  and  the 
security  specification  then  go  to  the  second  piece,  the  scenario  graph  generator. 

The  scenario  graph  generator  is  a  general-purpose  tool  based  on  the  algorithms  developed  in 
Paid  I  (Section  3.3).  It  takes  any  finite  model  and  correctness  specification  and  produces  a  graph 
composed  of  possible  executions  of  the  model  that  violate  the  correctness  specification.  The  model 
builder  constructs  the  input  to  the  graph  generator  so  that  the  output  will  be  the  desired  attack  graph. 
The  graphical  user  interface  lets  the  user  display  and  examine  the  graph. 


11.2  The  Model  Builder 

Recall  from  Section  8.1  that  a  network  attack  model  consists  of  six  primary  components: 
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Figure  11.1:  Toolkit  Architecture 

1 .  H,  a  set  of  hosts  connected  to  the  network 

2.  C,  a  connectivity  relation  expressing  the  network  topology  and  inter-host  reachability 

3.  T,  a  relation  expressing  trust  between  hosts 

4.  I,  a  model  of  the  intruder 

5.  A,  a  set  of  individual  attack  actions 

6.  Ids,  a  model  of  the  intrusion  detection  system 

To  construct  each  of  the  six  components,  the  model  builder  needs  to  collect  the  following  pieces  of 
information.  For  the  entire  network,  we  need: 

1 .  The  set  of  hosts  H 

2.  The  network  topology  and  firewall  rules,  which  together  induce  the  connectivity  relation  C 

3.  The  initial  state  of  the  trust  relation  T:  which  hosts  are  trusted  by  other  hosts  prior  to  any 
intruder  action 

Several  pieces  of  data  are  required  for  each  host  h  in  the  set  H : 

4.  A  unique  host  identifier  (usually  name  and  network  address) 

5.  Operating  system  vendor  and  version 

6.  Active  network  services  with  port  numbers 

7.  Common  Vulnerabilities  and  Exposures  IDs  of  all  vulnerabilities  present  on  h 

8.  User-specific  configuration  parameters 
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Finally,  for  each  CVE  vulnerability  present  on  at  least  one  host  in  the  set  H ,  we  need: 

9.  An  attack  rule  with  preconditions  and  effects 

We  designed  an  XML-based  format  covering  all  of  the  information  that  the  model  builder  requires. 
The  XML  format  lets  the  user  specify  each  piece  of  information  manually  or  indicate  that  the  data 
can  be  gathered  automatically  from  an  external  source.  A  typical  description  of  a  host  in  XML  is  as 
follows: 


1 

9 

chost  id="typical-machine" 

ip=" 192 .168 . 0 . 1"> 

Zi 

3 

<services> 

4 

<ftp  port="21"/> 

5 

<W3SVC  port=" 8 0 " /> 

6 

7 

8 

</services> 

<connectivity> 

9 

Cremote  id="machinel " 

<ftp/>  <W3SVC/>  </remote> 

10 

Cremote  id="machine2 "> 

<sshd/>  <W3SVC/>  </remote> 

11 

Cremote  id="machine3"> 

<sshd/>  </remote> 

12 

13 

</connectivity> 

14 

<cve> 

15 

CCAN-2 002-03 64 /> 

16 

CCAN-2  002-0147/> 

17 

18 

</cve> 

19 

</host> 

The  example  description  provides  the  host  name  and  network  identification  (line  1),  a  list  of 
active  services  with  port  numbers  (lines  3-6),  the  paid  of  the  connectivity  relation  that  involves  the 
host  (lines  8-12),  and  names  of  CVE  and  CVE-candidate  (CAN)  vulnerabilities  known  to  be  present 
on  the  host  (lines  14-17).  Connectivity  is  specified  as  a  list  of  services  that  the  host  can  reach  on 
each  remote  machine.  Lines  9-11  each  specify  one  remote  machine;  e.g.,  typical-machine 
can  reach  machinel  on  ports  assigned  to  the  ftp  and  W3SVC  (IIS  Web  Server)  services. 

It  is  unrealistic  to  expect  the  user  to  collect  and  specify  all  of  the  data  by  hand.  In  Sections  11.3- 
11.5  we  discuss  three  external  data  sources  that  supply  some  of  the  information  automatically:  the 
Nessus  vulnerability  scanner,  MITRE  Corp.’s  Outpost,  and  Lockheed  Martin’s  ANGI.  Whenever 
the  model  builder  can  get  a  specific  piece  of  information  from  one  of  these  sources,  a  special  tag  is 
placed  in  the  XML  file.  If  Nessus,  Outpost  and  ANGI  arc  all  available  at  the  same  time  as  sources 
of  information,  the  above  host  description  may  look  as  follows: 

<host  id="typ±cal-machine"  ip=" | Outpost | "> 

<services  source=" | Outpost | " /> 
cconnectivity  source=" |ANGI | "/> 

<cve  source=" | Nessus | " /> 


</host> 


84 


CHAPTER  11.  ATTACK  GRAPH  TOOLKIT 


The  model  builder  gets  the  host  network  address  and  the  list  of  running  services  from  Outpost, 
connectivity  information  from  ANGI,  and  a  list  of  existing  vulnerabilities  from  Nessus.  Once  all  of 
the  relevant  information  is  gathered,  the  model  builder  creates  a  finite  model  and  encodes  it  in  the 
input  language  of  the  scenario  graph  generator.  The  scenario  graph  generator  then  builds  the  attack 
graph. 


11.3  Attack  Graphs  with  Nessus 

A  savvy  attacker  might  use  one  of  the  many  widely  available  vulnerability  scanners  [23]  to  discover 
facts  about  the  network  and  construct  an  attack  scenario  manually.  Similarly,  an  attack  graph  gener¬ 
ator  can  use  a  scanner  to  construct  such  scenarios  automatically.  Our  attack  graph  toolkit  works  with 
the  freeware  vulnerability  scanner  Nessus  [30]  to  gather  information  about  reachable  hosts,  services 
running  on  those  hosts,  and  any  known  exploitable  vulnerabilities  that  can  be  detected  remotely. 

The  scanner  has  no  internal  knowledge  of  the  target  hosts,  and  will  usually  discover  only  paid  of 
the  information  necessary  to  construct  a  graph  that  includes  every  possible  attack  scenario.  Using 
only  an  external  vulnerability  scanner  to  gather  information  can  lead  the  system  administrator  to 
miss  important  attack  scenarios. 

Nevertheless,  the  administrator  can  run  vulnerability  scanners  against  his  own  network  to  find 
out  what  a  real  attacker  would  discover.  In  the  future,  sophisticated  intruders  are  likely  to  use  attack 
graph  generators  to  help  them  devise  attack  scenarios.  As  a  paid  of  network  security  strategy,  we  rec¬ 
ommend  running  a  vulnerability  scanner  in  conjunction  with  an  attack  graph  generator  periodically 
to  discover  avenues  of  attack  that  are  most  likely  to  be  exploited  in  practice. 


11.4  Attack  Graphs  with  MITRE  Outpost 

MITRE  Corporation’s  Outpost  is  a  system  for  collecting,  organizing,  and  maintaining  security- 
related  information  on  computer  networks.  It  is  a  suite  of  inter-related  security  applications  that 
share  a  common  data  model  and  a  common  data  collection  infrastructure.  The  goal  of  Outpost  is  to 
provide  a  flexible  and  open  environment  for  network  and  system  administrators  to  monitor,  control, 
and  protect  computer  systems. 

At  the  center  of  the  Outpost  System  is  a  data  collection/probe  execution  engine  that  gathers 
specific  configuration  information  from  all  of  the  systems  within  a  network.  The  collected  data 
is  stored  in  a  central  database  for  analysis  by  the  Outpost  applications.  Outpost  collects  data  about 
individual  hosts  only,  so  it  cannot  provide  information  about  network  topology  or  attack  rules.  Since 
Outpost  stores  all  of  the  data  in  a  network-accessible  SQL  database,  we  retrieve  the  data  directly 
from  the  database,  without  talking  to  the  Outpost  server,  as  shown  in  Figure  11.2. 

Currently  Outpost  works  with  SQL  databases  supported  by  Microsoft  and  Oracle.  Both  of  these 
packages  use  a  proprietary  Application  Programming  Interface.  The  model  builder  includes  an 
interface  to  each  database,  as  well  as  a  generic  module  that  uses  the  Open  DataBase  Connectivity 
interface  (ODBC)  and  works  with  any  database  that  supports  ODBC.  Furthermore,  it  is  easy  to  add 
a  capability  to  interface  with  other  types  of  databases. 

An  Outpost-populated  database  contains  a  list  of  client  hosts  monitored  by  the  Outpost  server. 
For  the  model  builder,  the  Outpost  server  can  provide  most  of  the  required  information  about  each 
individual  host  h,  including: 

1 .  A  unique  host  identifier  (usually  name  and  network  address) 
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Figure  11.2:  Outpost  Architecture 


2.  Operating  system  vendor  and  version 

3.  Active  network  services  with  port  numbers 

4.  Common  Vulnerabilities  and  Exposures  IDs  of  all  vulnerabilities  present  on  h 

5.  User-specific  configuration  parameters  (e.g.,  is  Javascript  enabled  for  the  user’s  email  client?) 

Outpost’s  lists  of  CVE  vulnerabilities  are  usually  incomplete,  and  it  does  not  keep  track  of  some  of 
the  user-specific  configuration  parameters  required  by  the  attack  graph  toolkit.  Until  these  deficien¬ 
cies  are  fixed,  the  user  must  provide  the  missing  information  manually. 

In  the  future,  the  Outpost  server  will  inform  the  attack  graph  toolkit  whenever  changes  are  made 
to  the  database.  The  tighter  integration  with  Outpost  will  enable  attack  graph  toolkit  to  re-generate 
attack  graphs  automatically  every  time  something  changes  in  the  network  configuration. 


11.5  Attack  Graphs  with  Lockheed’s  ANGI 

Lockheed  Martin  Advanced  Technology  Laboratory’s  (ATL)  Next  Generation  Infrastructure  (ANGI) 
IR&D  project  is  building  systems  that  can  be  deployed  in  dynamic,  distributed,  and  open  network 
environments.  ANGI  collects  local  sensor  information  continuously  on  each  network  host.  The  sen¬ 
sor  data  is  shared  among  the  hosts,  providing  dynamic  awareness  of  the  network  status  to  each  host. 
ANGI  sensors  gather  information  about  host  addresses,  host  programs  and  services,  and  network 
topology.  In  addition,  ANGI  supports  vulnerability  assessment  sensors  for  threat  analysis. 

Two  distinguishing  features  of  ANGI  are  the  ability  to  discover  network  topology  changes  dy¬ 
namically  and  focus  on  technologies  for  pro-active,  automated  repair  of  network  problems.  ANGI 
is  capable  of  providing  the  attack  graph  model  builder  with  network  topology  information,  which  is 
not  available  in  Outpost  and  is  not  gathered  by  Nessus. 

We  tested  our  attack  graph  toolkit  integrated  with  ANGI  on  a  testbed  of  five  hosts  with  combina¬ 
tions  of  the  five  CVE  vulnerabilities  specified  for  the  example  model  in  Chapter  10  (p.  72),  and  one 
adversary  host.  Figure  11.3  is  a  screenshot  of  the  testbed  network  schematic.  The  intruder  resides 
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Figure  11.3:  ANGI  Network 

on  the  host  lindenwold.  Hosts  trenton  and  mtholly  run  firewalls,  which  are  initially  dis¬ 
abled.  We  assume  that  the  target  of  the  intruder  is  the  host  shamong,  which  contains  some  critical 
resource. 

ANGI  combines  information  about  each  host  with  data  from  firewall  configuration  files  into  a 
single  XML  document.  To  convert  firewall  rules  into  a  reachability  relation  C  accepted  by  the  attack 
graph  toolkit,  ANGI  uses  a  package  developed  at  MITRE  Corp.  that  computes  network  reachability 
from  packet  filter  data  [78].  The  XML  file  specifies  explicitly  five  attack  rules  corresponding  to 
the  CVE  vulnerabilities  present  on  the  hosts.  ANGI  then  calls  the  model  builder  with  the  XML 
document  and  a  security  property  as  inputs.  The  security  property  specifies  a  guarantee  of  protection 
for  the  critical  resource  host  shamong: 

G (intruder. privilege[shamong\  <  root ) 

The  attack  graph  generator  finds  several  potential  attack  scenarios.  Figure  11.4  shows  the  attack 
graph  as  it  is  displayed  by  the  graphical  user  interface.  The  graph  consists  of  19  nodes  with  28 
edges. 

Exploring  the  attack  graph  reveals  that  several  successful  attack  scenarios  exploit  the  LICQ 
vulnerability  on  the  host  shamong.  One  such  attack  scenario  is  highlighted  in  Figure  11.4.  As 
indicated  in  the  “Path  Info”  pane  on  the  left  of  Figure  1 1.4,  the  second  step  of  the  highlighted  sce¬ 
nario  exploits  the  LICQ  vulnerability  on  shamong.  This  suggests  a  possible  strategy  for  reducing 
the  size  of  the  graph.  Using  the  ANGI  interface,  we  enable  the  firewall  on  the  host  trenton,  and 
add  a  rule  that  blocks  all  external  traffic  at  trenton  from  reaching  shamong  on  the  LICQ  port. 
ANGI  then  generates  a  new  XML  model  file  reflecting  this  change.  The  new  graph  demonstrates 
a  significant  reduction  in  network  exposure  from  this  relatively  small  change  in  network  configura¬ 
tion.  The  modification  reduces  graph  size  to  7  nodes  and  6  edges.  Figure  1 1.5  shows  the  graph  for 
the  modified  network. 

Looking  at  the  scenarios  in  the  new  graph,  we  discover  that  the  attacker  can  still  reach  shamong 
by  first  compromising  the  web  server  on  cherryhill.  Since  we  do  not  want  to  disable  the  web 
server,  we  enable  the  firewall  on  mtholly  and  add  a  rule  specifically  blocking  cherryhill’s 
access  to  the  LICQ  client  on  shamong.  Yet  another  invocation  of  the  attack  graph  generator  on  the 
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Figure  11.4:  ANGI  Attack  Graph  -  No  Firewalls 
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Figure  11.5:  ANGI  Attack  Graph  -  One  Firewall 
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modified  model  produces  an  empty  attack  graph  and  confirms  that  we  have  successfully  safeguarded 
shamong  while  retaining  the  full  functionality  of  the  network. 
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Chapter  12 


Attack  Graphs:  Related  Work 


Many  of  the  ideas  that  we  investigate  in  Part  II  have  been  suggested  or  considered  in  existing  work 
in  the  computer  security  field.  In  this  chapter  we  survey  the  related  work. 

Vulnerability  scanning  tools  form  a  necessary  paid  of  any  complete  system  for  network  vul¬ 
nerability  analysis,  as  they  provide  the  input  to  build  the  network  model.  There  arc  a  variety  of 
commercial  and  open  source  tools  that  scan  single  hosts  for  known  vulnerabilities.  Several  promi¬ 
nent  ones  arc  COPS  [1],  CyberCop  [5],  and  Nessus  [30].  Each  host  running  network  services 
exposes  vulnerabilities  found  by  these  scanners  to  hosts  outside  the  network.  Such  exposures  could 
be  benign  or  hazardous  in  nature.  However,  plugging  all  the  vulnerabilities  based  on  the  output  of  a 
scanning  tool  may  render  a  network  unusable  to  bona-fide  users. 

The  focus  of  the  work  in  Paid  II  of  this  thesis  is  on  chaining  together  the  vulnerabilities  un¬ 
covered  by  such  tools  to  uncover  end-to-end  attack  scenarios  and  thus  discover  the  nature  of  the 
exposed  vulnerabilities.  Two  distinct  lines  of  work  relate  to  our  contributions  to  network  security 
research:  constructing  multi-stage  penetration  scenarios,  and  building  and  analyzing  entire  attack 
graphs.  Both  lines  of  research  have  employed  model  checking  technology.  We  start  with  the  work 
on  single  attack  scenarios. 

Templeton  and  Levitt  [91]  propose  a  requires/provides  model  for  modeling  chains  of  network 
exploits.  The  model  links  exploits  into  scenarios,  with  earlier  exploits  supplying  the  prerequisites 
for  the  later  ones.  The  requires  paid  corresponds  to  our  action  preconditions,  and  the  provides  paid 
corresponds  to  our  action  effects.  They  describe  a  language  for  specifying  exploits  and  point  out 
that  relating  seemingly  innocuous  system  behavior  to  known  attack  scenarios  can  help  discover  new 
exploits. 

Templeton  and  Levitt  do  not  consider  combining  their  attack  scenarios  into  attack  graphs;  how¬ 
ever,  their  ideas  on  reducing  the  complexity  of  the  attack  tree  can  be  used  within  our  model  to 
reduce  the  size  of  the  attack  graph.  We  already  abstract  multiple  vulnerabilities  on  a  host,  with  sim¬ 
ilar  effects,  to  a  generic  vulnerability  and  hence  simplify  the  analysis.  In  future  work,  we  would  like 
to  abstract  a  group  of  hosts  with  similar  abstract  vulnerabilities  as  a  single  node  to  further  reduce 
model  complexity. 

Ramakrishnan  and  Sekar  use  a  model  checker  to  analyze  single  host  systems  with  respect  to 
combinations  of  unknown  vulnerabilities  [77].  Going  further,  Ritchey  and  Ammann  use  a  model 
checker  to  construct  single  attack  scenarios  to  test  heterogeneous  multi-host  networks  [79]  with 
respect  to  known  exploits,  such  as  ones  found  on  Bugtraq  or  ICAT  Metabase.  They  use  the  (un¬ 
modified)  model  checker  SMV  [85]  to  construct  the  scenarios.  Both  Ramakrishnan  and  Sekar  and 
Ritchey  and  Ammann  obtain  only  one  counter-example;  that  is,  only  one  attack  scenario  ending  in 
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a  state  that  violates  the  security  specification. 

Cuppens  and  Ortalo  [26]  propose  a  declarative  language  (LAMBDA)  for  specifying  attacks  in 
terms  of  pre-  and  post-conditions.  LAMBDA  is  a  superset  of  the  simple  language  we  used  to  model 
attacks  in  our  work.  The  language  is  modular  and  hierarchical;  higher-level  attacks  can  be  described 
using  lower-level  attacks  as  components.  LAMBDA  also  includes  intrusion  detection  elements. 
Attack  specifications  includes  information  about  the  steps  needed  to  detect  the  attack  and  the  steps 
needed  to  verify  that  the  attack  has  already  been  carried  out.  Using  a  database  of  attacks  specified 
in  LAMBDA,  Cuppens  and  Miege  [25]  propose  a  method  for  alert  correlation  based  on  matching 
post-conditions  of  some  attacks  with  pre-conditions  of  other  attacks  that  may  follow.  In  effect,  they 
exploit  the  fact  that  alerts  about  attacks  arc  more  likely  to  be  related  if  the  corresponding  attacks 
can  be  a  part  of  the  same  attack  scenario.  It  would  be  interesting  to  apply  similar  alert  correlation 
techniques  to  full  scenario  graphs. 

In  the  references  listed  so  far,  the  output  is  limited  to  a  single  attack  scenario  ending  in  a  state 
that  violates  the  security  specification.  Their  focus  is  on  building  the  network  model  and  on  creating 
flexible  and  portable  languages  for  specifying  attack  steps.  In  contrast,  we  use  model  checkers 
NuSMV  and  SPIN  to  produce  full  attack  graphs,  representing  all  possible  attack  scenarios.  We  also 
describe,  in  Chapter  9  and  elsewhere  [52,  80],  post-generation  analyses  of  attack  graphs  that  help 
administrators  make  decisions  about  improving  network  security.  These  analyses  cannot  be  used  on 
single  attack  scenarios. 

Graph-based  data  structures  have  been  used  in  network  intrusion  detection  systems,  such  as 
NetSTAT  [94],  There  arc  two  major  components  in  NetSTAT,  a  set  of  probes  placed  at  different 
points  in  the  network  and  an  analyzer.  The  analyzer  processes  events  generated  by  the  probes  and 
generates  alarms  by  consulting  a  network  fact  base  and  a  scenario  database.  The  network  fact 
base  contains  information  (such  as  connectivity)  about  the  network  being  monitored.  The  scenario 
database  has  a  directed  graph  representation  of  various  atomic  attacks.  For  example,  the  graph 
corresponding  to  an  IP  spoofing  attack  shows  various  steps  that  an  intruder  takes  to  mount  that 
specific  attack.  The  authors  state  that  “in  the  analysis  process  the  most  critical  operation  is  the 
generation  of  all  possible  instances  of  an  attack  scenario  with  respect  to  a  given  target  network.” 
As  stated  in  the  introduction  to  Part  II,  this  is  an  important  rationale  for  our  application  of  model 
checking  to  attack  graphs. 

The  earliest  work  in  producing  full  attack  graphs  is  the  Kuang  system  [7] ,  and  its  extension  to  a 
network  environment,  the  NetKuang  system  [99].  In  these  systems  a  backward,  goal -based  search 
scans  UNIX  systems  for  poor  configurations.  The  output  is  a  combination  of  operations  that  lead  to 
compromise. 

Dacier  [27]  proposes  a  concept  of  privilege  graphs.  Each  node  in  the  privilege  graph  represents 
a  set  of  privileges  owned  by  the  user;  edges  represent  vulnerabilities.  Privilege  graphs  arc  then 
explored  to  construct  attack  state  graphs ,  which  represents  different  ways  in  which  an  intruder 
can  reach  a  certain  goal,  such  as  root  access  on  a  host.  Dacier  also  defines  a  metric,  called  the 
mean  effort  to  failure  or  METF.  METF  is  a  is  probabilistic  metric  based  on  assigning  likelihoods 
to  attacks.  Building  on  this  work,  Dacier  et  al  [28]  also  use  graph  analysis  for  network  security 
evaluation.  Later,  Ortalo  et  al.  describe  an  experimental  evaluation  of  a  security  framework  based 
on  these  ideas  [69]. 

Phillips  and  Swiler  [72]  propose  the  concept  of  attack  graphs  that  is  closest  to  the  one  described 
in  Part  II.  If  we  have  a  finite  state  model  of  the  system  and  restrict  our  attention  to  safety  properties, 
an  attack  graph  may  be  constructed  by  doing  simple  forward  search.  Starting  with  the  initial  states 
of  the  model  M,  we  use  a  graph  traversal  procedure  (e.g.,  depth  first  search)  to  find  all  reachable 
fail  states  where  the  security  property  P  is  violated.  The  attack  graph  is  then  the  union  of  all  paths 
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from  initial  states  to  fail  states.  Swiler  et.  al.  [89]  use  this  strategy  in  their  tool  for  generating  attack 
graphs.  Their  tool  constructs  the  attack  graph  by  forward  exploration  starting  from  the  initial  state. 
The  advantage  of  using  model  checking  instead  of  forward  search  is  that  the  technique  can  handle 
any  security  property  expressible  in  temporal  logic,  including  liveness  properties. 

Swiler  et.  al.  eliminate  redundant  paths  from  their  attack  graphs  with  what  amounts  to  an 
assumption  of  monotonicity,  namely  that  the  order  of  attacks  does  not  matter.  Their  approach  is 
fundamentally  related  to  partial-order  reduction  techniques  [92,  40,  45,  65,  71].  While  it  does  not 
eliminate  the  fundamental  exponential  character  of  attack  graphs,  it  does  help  control  the  visited 
parts  of  the  state  space.  Swiler  et.  al.  also  consider  cost  functions,  which  we  do  not.  It  would  be 
interesting  to  apply  their  techniques  to  our  model. 

Dawkins  et.  al.  [29]  specify  another  language  for  modeling  exploits,  and  they  use  this  language 
to  provide  a  hierarchical  view  of  attack  trees.  The  hierarchy  helps  present  information  to  the  user  in 
a  more  manageable  way,  but  the  approach  does  not  have  the  goal  directed  aspect  of  the  other  refer¬ 
ences  listed  here.  That  is,  their  procedure  results  in  a  graph  representing  all  possible  compromises, 
and  not  just  those  of  interest  to  a  specific  intruder  goal. 

On  the  surface,  the  notion  of  attack  graphs  proposed  by  Dacier  and  the  one  used  by  Phillips  and 
Swiler  are  similar  to  ours.  However,  both  of  those  attack  graph  definitions  take  an  “attack-centric” 
view  of  the  world.  As  pointed  out  above,  our  attack  graphs  arc  more  general.  Since  we  work  with 
a  general  modeling  language,  the  actions  in  our  model  can  be  both  seemingly  benign  system  events 
(such  as  failure  of  a  link)  and  malicious  events  (such  as  attacks).  Furthermore,  in  the  future  it  will 
be  easy  to  incorporate  routine  behavior  of  network  users  into  the  model  and  explore  the  interaction 
between  legitimate  and  malicious  activity. 

Ammann  et.  al.  address  the  exponential  nature  of  the  attack  graphs  common  to  Dacier,  Phillips 
and  Swiler,  and  our  work.  They  present  an  implicit,  scalable  attack  graph  representation  that  avoids 
the  exponential  blow-up  of  explicit  attack  graphs  [4] .  Attack  graphs  are  encoded  as  dependencies 
among  exploits  and  security  conditions,  under  the  assumption  of  monotonicity.  Informally,  mono¬ 
tonicity  means  that  no  action  an  intruder  can  take  interferes  with  the  intruder’s  ability  to  take  any 
other  actions.  The  authors  treat  vulnerabilities,  intruder  access  privileges,  and  network  connectivity 
as  atomic  boolean  attributes,  and  actions  as  atomic  transformations  that,  given  a  set  of  preconditions 
on  the  attributes,  establish  a  set  of  postconditions.  In  this  model,  monotonicity  means  that  (1)  once 
a  postcondition  is  satisfied,  it  can  never  become  ’unsatisfied’,  and  (2)  the  negation  operator  cannot 
be  used  in  expressing  action  preconditions. 

The  authors  show  that  under  the  monotonicity  assumption  it  is  possible  to  construct  an  efficient 
(low-order  polynomial)  attack  graph  representation  that  scales  well.  They  present  an  efficient  algo¬ 
rithm  for  extracting  minimal  attack  scenarios  from  the  representation,  and  suggest  that  a  standard 
graph  algorithm  can  produce  a  critical  set  of  actions  that  disconnects  the  goal  state  of  the  intruder 
from  the  initial  state. 

This  approach  is  less  general  than  our  treatment  of  attack  graphs.  In  addition  to  the  mono¬ 
tonicity  requirement,  it  can  handle  only  simple  safety  properties.  Further,  the  compact  attack  graph 
representation  is  less  explicit,  and  therefore  harder  for  a  human  to  read.  The  advantage  of  their  ap¬ 
proach  is  that  it  has  a  polynomial  worst-case  bound  on  the  size  of  the  graph,  and  therefore  in  theory 
can  scale  better  to  large  networks.  It  is  also  possible  to  perform  certain  useful  analyses  without 
instantiating  the  full  attack  graph. 

Lye  and  Wing  [62]  present  a  game-theoretic  method  for  analyzing  the  security  of  computer 
networks.  Their  model  is  similar  to  the  one  sketched  in  Appendix  A.  They  view  the  interactions  be¬ 
tween  an  attacker  and  the  administrator  as  a  two-player  general-sum  stochastic  game  and  construct 
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a  model  for  the  game.  Using  a  non-linear  program,  they  compute  Nash  equilibria  or  best-response 
strategies  for  both  players  (attacker  and  administrator).  Their  work  is  complementary  to  ours:  they 
report  using  our  method  for  automatically  generating  attack  graphs  to  replicate  their  example,  which 
was  originally  generated  manually.  Thus,  our  model-checking  algorithm  can  be  used  as  the  first  step 
in  their  analysis,  to  generate  the  game  tree  automatically. 


Chapter  13 


Conclusions  and  Future  Work 


In  this  chapter  we  summarize  our  scientific  contributions  and  disscuss  avenues  for  further  research. 


13.1  Summary  of  Contributions 

Traditional  model  checkers  give  users  a  single  counterexample,  expecting  the  user  to  fix  the  prob¬ 
lem  and  run  the  model  checker  again.  Sometimes,  however,  repeatedly  checking  for  and  fixing 
individual  flaws  does  not  work  well.  We  developed  formal  techniques  that  give  the  user  access  to 
scenario  graphs  that  represent  the  full  set  of  faulty  behaviors.  Scenario  graphs  give  the  user  more 
control  over  the  design  refinement  process. 

In  Part  I  we  specified  scenario  graphs  formally  and  gave  definitions  for  sound,  exhaustive,  and 
succinct  scenario  graphs.  We  presented  two  algorithms  for  automatically  generating  full  scenario 
graphs  from  finite  models.  We  have  further  specified  a  related  algorithm  that  produces  smaller 
graphs  by  replacing  subsets  of  scenarios  with  representative  prefixes.  All  of  the  algorithms  have 
been  shown  to  produce  sound,  exhaustive,  and  succinct  graphs.  The  algorithms’  performance  is 
lineal-  in  the  size  of  the  reachable  state  space  of  the  model. 

Our  implementation  of  the  scenario  graph  algorithms  exhibited  linear  performance  in  empirical 
tests.  We  also  defined  and  implemented  the  traceback  mode,  a  new  technique  for  explicit  state  space 
search.  In  contrast  with  existing  methods  for  reducing  memory  requirements,  the  traceback  mode 
retains  collision  detection  functionality  in  the  hash  table  of  states.  Instead,  memory  savings  are 
achieved  at  the  cost  of  performance. 

In  Part  II  we  applied  our  formal  concepts  to  the  security  domain.  Building  on  the  foundation 
established  in  Part  I,  we  defined  attack  graphs,  an  application  of  scenario  graphs  to  represent  ways 
in  which  intruders  attack  computer  networks.  We  constructed  a  finite  model  of  computer  networks 
and  applied  scenario  graph  algorithms  to  the  model  to  generate  attack  graphs.  We  showed  how 
system  administrators  can  use  attack  graphs  to  check  how  vulnerable  their  systems  are  and  select 
additional  measures  to  improve  security. 

We  built  an  attack  graph  toolkit  to  support  our  generation  and  analysis  algorithms.  The  toolkit 
has  an  easy-to-use  graphical  user  interface.  We  integrated  our  tools  with  external  sources  to  populate 
our  network  attack  model  with  host  and  vulnerability  data  automatically.  The  toolkit  has  been  used 
in  collaboration  with  Lockheed  Martin  Advanced  Technologies  Laboratory  for  a  small  case  study. 

Any  formal  modeling  technique  has  advantages  and  disadvantages  when  applied  to  a  particular 
domain.  The  primary  difficulty  in  our  approach  is  in  collecting  the  information  used  to  build  the 
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network  model.  The  attack  graphs  we  produce  arc  only  as  good  as  the  inputs  to  the  model  builder. 
Every  approach  to  information  gathering  that  we  covered  in  Chapter  11  has  a  negative  side:  the 
Nessus  scanner  provides  incomplete  information.  Outpost  and  ANGI  place  onerous  requirements 
on  the  software  infrastructure,  and  hand-built  models  consume  too  many  man-hours  of  labor. 

A  related  difficulty  is  in  modeling  the  actions  of  the  attacker.  New  actions  arc  devised  all  the 
time,  and  if  we  omit  an  attacker  action,  then  the  attack  graph  will  give  the  user  an  incomplete  picture 
of  network  vulnerability.  These  limitations  arc  shared  by  other  formal  modeling  techniques.  Our 
formal  framework  at  least  gives  system  administrators  a  formal  basis  for  making  decisions  relative 
to  the  accuracy  of  the  input  model. 


13.2  Future  Work 

In  this  section  we  mention  a  few  areas  of  interest  where  attack  graph  technology  could  be  applied. 

13.2.1  Library  of  Actions 

Kannan,  Sridhar,  and  Wing  [54]  plan  to  specify  a  library  of  actions  based  on  a  vulnerability  database 
provided  to  us  by  the  Computer  Emergency  Response  Team  (CERT).  This  database  has  over  150  ac¬ 
tions  representing  many  published  CVEs.  Kannan  et  al.  used  a  subset  of  30  of  these  actions  as  input 
to  the  model  builder,  allowing  them  to  produce  attack  graphs  with  over  300  nodes  and  3000  edges 
in  a  few  minutes.  With  their  growing  library  of  actions,  they  intend  to  perform  more  systematic 
experiments:  on  different  network  configurations,  with  different  subsets  of  actions,  and  for  different 
attacker  goals.  Their  preliminary  results  indicate  that  it  is  difficult  to  get  useful  information  from 
large  graphs  by  manual  inspection.  It  is  therefore  essential  to  expand  and  implement  our  repertoire 
of  automated  analyses,  and  to  explore  techniques  for  reducing  graph  size. 

13.2.2  Alert  Correlation 

We  have  already  touched  upon  the  necessity  of  correlating  low-level  intrusion  detection  alerts  in 
Chapter  6.  Ning,  Cui,  and  Reeves  [67]  propose  an  approach  for  constructing  individual  attack 
scenarios  by  correlating  alerts  on  the  basis  of  prerequisites  and  consequences  of  actions.  Their 
method  correlates  alerts  by  (partially)  matching  the  consequence  of  some  previous  alerts  and  the 
prerequisite  of  some  later  ones. 

By  using  full  attack  graphs  instead  of  individual  scenarios,  we  can  further  raise  the  level  of 
reasoning  about  events  in  a  network.  Instead  of  looking  at  individual  events  in  the  network  and 
trying  to  determine  if  one  or  a  few  of  them  might  constitute  good  evidence  that  a  particular  attack 
is  in  progress,  we  focus  instead  on  the  intentions  of  the  intruder.  Recognizing  that  a  series  of 
alerts  corresponds  to  a  series  of  steps  that  could  lead  to  system  compromise  could  be  a  guide  to 
distinguishing  false  alarms  from  real  attacks. 

The  next  step  would  be  building  a  proof-of-concept  intrusion  detection  system  based  on  attack 
graphs.  Given  a  database  of  graphs  describing  all  likely  attack  scenarios,  the  system  would  match 
individual  alerts  to  actions  in  the  graphs.  Matching  successive  alerts  to  individual  paths  in  an  attack 
graph  dramatically  increases  the  likelihood  that  the  network  is  in  fact  under  attack.  The  IDS  would 
aggregate  alarms  to  reduce  the  volume  of  alert  information  waiting  for  analysis,  reduce  the  rate  of 
false  alarms,  and  try  to  predict  intruder  goals.  Knowledge  of  intruder  goals  and  likely  next  steps 
would  help  guide  defensive  response. 
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Figure  13.1:  Alert  Correlation  with  Attack  Graphs 

13.2.3  New  Attack  Discovery 

By  matching  the  parameters  of  seemingly  innocuous  system  configuration  with  action  prerequisites 
in  known  attack  scenarios,  it  is  possible  to  discover  that  the  configuration  can  be  turned  into  an 
exploit  that  will  effectively  plug  into  the  attack  scenario.  Templeton  and  Levitt  [91]  cite  an  example 
of  this  strategy  in  action.  By  analyzing  the  peculiarities  of  Ethernet  MAC  control  packets  and 
the  ARP  protocol  they  discovered  an  exploit  that  could  be  used  in  conjunction  with  one  of  their 
individual  attack  scenarios.  It  would  be  interesting  to  explore  this  technique  in  conjunction  with  full 
attack  graphs. 

13.2.4  Prediction  and  Automated  Response 

Most  of  the  attacks  earned  out  today  are  scripted,  with  an  entire  scenario  earned  out  in  quick 
succession.  With  existing  intrusion  detection  systems,  the  attack  is  usually  over  by  the  time  the 
system  administrator  investigates  the  alarms.  Only  automated  attack  response  can  hope  to  prevent 
such  attacks  before  they  finish. 

Prediction  is  a  natural  extension  of  alert  correlation  with  attack  graphs.  If  we  are  able  to  deter¬ 
mine  with  a  reasonable  degree  of  certainty  that  the  intruder  has  brought  the  system  to  a  particular 
state  in  the  attack  graph,  we  can  predict  his  immediate  actions  and,  quite  possibly,  his  ultimate  goals. 
Such  information  can  be  an  important  aid  as  the  system  administrator  deals  with  the  intrusion. 

13.2.5  Example:  Alert  Correlation,  Prediction,  and  Response 

We  illustrate  with  an  example  a  three-step  strategy  for  correlating  low-level  alerts  and  then  respond¬ 
ing  to  the  attack.  Going  back  to  the  network  we  considered  in  Chapter  10,  assume  that  we  have 
compiled  a  database  of  attack  graphs  corresponding  to  various  intruder  goals.  We  add  intrusion  sen¬ 
sors  to  all  hosts  in  the  DMZ  and  on  the  internal  network.  The  sensors  watch  the  network  traffic  and 
report  activity  that  could  indicate  that  an  attack  is  in  progress.  Such  sensors  often  have  high  false 
alarm  rates,  and  it  is  not  feasible  to  investigate  every  alarm.  However,  armed  with  pre-computed 
attack  graphs,  we  can  make  good  use  of  the  noisy  sensor  data. 
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Suppose,  for  instance,  that  we  receive  an  alert  indicating  that  a  remote  buffer  overflow  action 
may  have  been  executed  against  the  IIS  Web  Server,  followed  by  an  alert  about  a  possible  Squid 
portscan  attack  on  the  Linux  host.  The  first  step  is  to  match  the  alerts  to  the  attack  graphs  in 
our  database.  The  alerts  correspond  to  attacks  that  form  the  first  two  steps  of  an  attack  scenario 
in  the  attack  graph  that  we  saw  in  Section  10.2.  These  steps  arc  indicated  with  boxed  nodes  in 
Figure  13.1.  Despite  low  confidence  in  the  accuracy  of  the  individual  sensor  alerts,  we  now  have  a 
strong  indication  that  a  real  intrusion  is  in  progress. 

Next,  we  can  use  the  attack  graph  to  form  hypotheses  about  the  intruder’s  current  progress, 
his  likely  next  action,  and  his  ultimate  goal.  Assuming  that  the  intruder’s  progress  corresponds 
to  the  lower  of  the  three  boxed  states  in  Figure  13.1,  we  can  focus  our  attention  on  the  subset  of 
attack  scenarios  reachable  from  that  state.  The  intruder’s  likely  next  step  corresponds  to  one  of  the 
transitions  out  of  the  lower  boxed  state  to  a  double-circle  state. 

We  can  further  hypothesize  that  the  intruder  is  likely  to  take  the  shortest  route  to  his  goal, 
meaning  that  his  most  likely  next  step  is  the  leftmost  transition  (that  is,  a  remote  compromise  of  the 
LICQ  client  of  the  Linux  host).  The  IDS  software  can  now  attempt  to  disrupt  the  intruder  with 
an  appropriate  counter-action:  disabling  the  LICQ  client  on  the  Linux  host  or  closing  the  internal 
firewall  to  cut  off  the  intruder’s  link  to  the  host. 


13.3  Final  Word 

Attack  graphs  have  been  a  subject  of  study  by  security  researchers  for  several  years,  and  they  served 
as  the  original  motivation  for  our  work  on  scenario  graphs.  Based  on  the  inquiries  received  in  the 
course  of  our  work,  we  believe  that  the  attack  graph  application  will  continue  to  generate  interest 
in  the  future.  However,  the  field  is  wide  open  for  more  applications  of  scenario  graphs.  We  believe 
that  other  kinds  of  systems  can  benefit  from  comprehensive  modeling  of  faulty  behavior  and  hope 
to  see  more  research  in  this  area  in  the  near  future. 


Appendix  A 


Attack  Graphs  and  Game  Theory 


In  this  appendix  we  develop  game-theoretic  models  for  describing  systems  under  attack  and  show 
how  to  generate  attack  graphs  for  such  models  with  respect  to  particular  security  properties. 


A.l  Background:  Extensive-Form  Games 

Game  theory  uses  the  extensive  form  to  describe  games  where  the  players  make  multiple  moves 
and  base  their  strategy  at  each  turn  on  information  about  moves  that  have  already  occurred.  Our 
treatment  of  extensive-form  games  mostly  follows  [37],  Chapter  3;  exceptions  arc  noted  explicitly. 
Extensive  form  games  contain  the  following  information: 

1.  the  set  of  players,  denoted  by  I, 

2.  the  set  of  all  possible  actions  (moves),  denoted  by  A, 

3.  the  order  of  moves  (who  moves  when), 

4.  what  the  players’  available  actions  are  when  it’s  their  turn  to  move, 

5.  the  players'  payoffs  as  a  function  of  the  moves  that  were  made, 

6.  what  each  player  knows  when  he  makes  his  choices,  and 

7.  the  probability  distributions  over  any  exogenous  events,  represented  as  moves  by  a  special 
player  Nature  and  denoted  by  N  G  X. 

A  finite  game  tree  T  represents  the  order  of  play  and  the  players’  available  choices  at  each  stage 
in  the  game  (points  3  and  4).  Using  a  finite  tree  to  represent  game  histories  rules  out  infinite  games. 
In  addition,  in  a  tree  each  node  (except  the  root  node)  has  exactly  one  predecessor,  which  means 
that  distinct  game  histories  cannot  converge  at  a  single  node.  Each  game  tree  node  s  €  S  is  thus  a 
complete  description  of  the  history  up  to  that  point  in  the  game. 

To  complete  the  specification  of  points  3  and  4,  in  addition  to  the  game  tree  we  introduce  two 
associated  functions.  The  state-ownership  map  o  :  S  — ►  1  specifies  which  player  may  perform 
an  action  in  each  state.  We  say  that  the  player  o(s)  owns  state  s.  The  action  assignment  function 
a  :  S  — >  A  labels  each  non-initial  node  s  with  the  last  action  taken  to  reach  it.  We  require  that  a  be 
one-to-one  on  the  set  of  immediate  successors  of  each  node,  so  that  different  successors  correspond 
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to  different  actions.  We  say  that  A(s)  is  the  set  of  actions  available  at  s.  Thus,  ,4(.s)  is  the  range  of 
a  on  the  set  of  immediate  successors  of  s. 

The  players’  payoffs  are  represented  by  utility  functions  ut  :  S  — >  II.  In  a  slight  departure 
from  [37],  we  assign  a  payoff  to  every  node  (instead  of  just  the  terminal  nodes),  with  ufs)  being 
player  i’s  payoff  if  the  game  is  terminated  at  node  s. 

Point  6,  the  information  players  have  when  choosing  their  actions,  is  the  most  subtle  paid  of  ex¬ 
tensive  form  games.  This  information  is  represented  using  information  sets  h  G  I  f  which  partition 
the  set  of  nodes  in  the  tree  -  that  is,  every  node  is  in  exactly  one  information  set.  The  interpretation 
of  the  information  set  h(s)  containing  node  s  is  that  the  player  who  is  choosing  an  action  at  s  is 
uncertain  if  he  is  at  s  or  at  some  other  node  s'  G  h(s).  To  prevent  ambiguity  with  move  ordering, 
we  require  that  the  same  player  must  move  in  all  nodes  of  an  information  set,  and  further  that  the 
player  must  have  identical  action  choices  in  all  nodes  of  an  information  set.  Thus,  we  can  speak 
of  an  action  set  A(h)  of  an  information  set  h.  In  the  special  case  of  games  of  perfect  information 
all  information  sets  are  singletons  -  the  player  knows  at  all  times  exactly  where  in  the  game  tree 
the  current  configuration  is  located.  However,  in  our  intended  area  of  application  it  is  unrealistic  to 
assume  perfect  information  for  either  the  intruder  or  the  defender  of  a  network. 

In  summary,  an  extensive  form  game  is  a  tuple  Q  =  (X,  A,  T,  o,  a,  { ut \ i  G  X},  H)  incorporating 
the  set  of  players,  the  set  of  actions,  the  game  tree,  the  state  ownership  and  action  assignment 
functions,  utility  functions  for  each  player,  and  the  information  sets. 

A.2  Game-Theoretic  Attack  Models 

In  game-theoretic  terms,  an  attack  model  W  =  (X,  A,  T,  o,  a,  {ui\i  G  X}.  H)  is  an  extensive- 
form  game  with  three  players  X  =  {a,  D.  S}  representing  an  attacker,  a  defender,  and  the  system, 
respectively.  With  each  player  P  G  X  we  associate  actions  in  the  set  AP,  so  that  the  total  set  of  action 
in  the  game  is  A  =  Upgx  Tip.  In  general,  the  attacker’s  actions  move  the  system  “toward”  some 
undesirable  (from  the  system's  point  of  view)  state,  and  the  defender’s  actions  attempt  to  counteract 
that  effect. 

The  game  free  X,  with  node  set  S  and  transition  relation  r  :  S  x  S,  can  take  any  shape,  subject 
to  the  restrictions  imposed  by  the  extensive  form  game  formalism.  There  is  a  single  root  state  so 
representing  the  initial  state  of  each  player  before  any  action  has  taken  place. 

For  the  purpose  of  generating  attack  graphs  with  the  techniques  developed  in  Paid  I,  an  attack 
model  VV  so  defined  may  be  recast  as  a  Biichi  model  Myv  =  (S,  r,  {.so},  S,  S,  D),  with  the  node 
set  and  transition  relation  of  the  game  tree  interpreted,  respectively,  as  the  state  space  and  transition 
relation  of  the  Biichi  model.  The  labeling  D  of  the  Biichi  model  encodes  the  state  ownership  and 
action  assignment  functions:  for  all  states  s,  D(s)  =  {owner  =  o(s) ,  action  =  a(s)}. 

At  the  end  of  the  game  the  attacker  and  the  defender  get  a  payoff  that  generally  depends  on  the 
degree  of  attacker’s  success.  In  traditional  game-theoretic  terminology,  the  system  plays  the  role  of 
Nature  and  does  not  get  payoffs.  In  contrast,  the  attacker  and  the  defender  arc  active  players.  Each 
active  player  P  has  a  total  payoff  function  uP  :  S  — >  R,  associating  a  payoff  with  every  state  s  G  S. 

The  simplest  case  has  two  possible  payoffs  for  each  active  player,  {succ  =  1,  fail  =  -1}.  More 
complicated  situations  with  multiple  prioritized  attacker  targets  would  have  commensurately  more 
complex  payoff  functions.  The  defender  may  likewise  have  a  more  complicated  payoff  matrix.  For 
instance,  it  may  be  possible  for  the  defender  to  close  the  firewall  and  prevent  further  incursions 
by  the  attacker,  but  the  closed-firewall  outcome  is  less  desirable  than  the  status-quo  since  legiti- 
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mate  users  arc  also  prevented  from  accessing  the  network.  This  difference  will  be  reflected  in  the 
defender’s  payoffs. 


A. 3  Attack  Graphs 

The  full  game  tree  is  typically  very  large  for  any  attack  model  that  describes  a  non-trivial  system.  As 
we  have  explained  previously,  for  the  purposes  of  vulnerability  assessment  and  defense  planning  we 
are  particularly  interested  in  games  where  the  attacker  is  successful.  Intuitively,  an  attack  graph  is  a 
subgraph  of  the  game  tree  where  in  each  final  state  the  attacker’s  payoff  is  succ  and  the  defender’s 
payoff  is  fail.  The  difference  between  the  game  tree  and  the  attack  graph  is  that  the  latter  can  coa¬ 
lesce  identical  subtrees  into  a  single  branch,  and  can  represent  infinite  behaviors  with  cycles.  More 
formally,  an  attack  graph  is  a  safety  scenario  graph  with  an  appropriately  defined  safety  property 
(Section  2.1.8): 

Definition  A.l  Given  an  attack  model  W,  define  a  safety  property  P  with  the  logical  predicate 
Qp  :  S  — >  bool  such  that  Qp(s)  if  and  only  if  ua(s)  =  fail.  An  attack  graph  ofW  is  a  safety 
scenario  graph  Ga  =  ( Sa ,  ra,  {soli  £/,  D)>  where  Sa  C  S,  ra  C  r,  and  Sj  =  {s  :  5a|-i<5p(s)}. 

Using  the  game-theoretic  definitions  of  attack  models  and  attack  graphs,  we  can  model  situations 
where  the  network  actively  defends  itself  against  attack.  This  is  an  interesting  research  direction 
that  we  hope  will  be  explored  in  the  future.  In  Sections  A.5  and  A.6  we  offer  examples  of  future 
research  directions. 


A.4  Behavior  Strategies 

Let  Si  =  {s  G  S  |  o(s)  =  i}  be  the  set  of  states  owned  by  player  i.  Let  A*  =  Uses  A(s)  be  the  set 
of  all  actions  for  player  i.  A  pure  strategy  for  player  i  is  a  map  psi  :  Si  — >  At,  with  psfis)  6  A(s) 
for  all  states  s.  Player  i’ s  space  of  pure  strategies,  PS,,  is  simply  the  space  of  all  such  pst.  A 
strategy  profile  assigns  a  strategy  to  each  player  in  the  game. 

Given  a  pure  strategy  for  each  player  i  and  the  probability  distribution  over  Nature’s  (in  our 
case,  the  Network’s)  moves,  we  can  compute  a  probability  distribution  over  outcomes  and  thus 
assign  expected  payoffs  u-fpf  )  to  each  strategy  profile  pf. 

A.4.1  Strategic  Equilibria  -  Pure  Strategies 

A  pure-strategy  Nash  equilibrium  for  an  extensive-form  game  is  a  strategy  profile  pf*  such  that 
each  player  i’ s  strategy  pf*  maximizes  his  expected  payoff  given  the  strategies  of  his  opponents. 
Many  complex  games  do  not  admit  pure-strategy  equilibria,  but  pure  strategies  suffice  for  games  of 
perfect  information. 

Theorem  A.l  (Zermelo  1913;  Kuhn  1953 )  A  finite  game  of  perfect  information  has  a  pure-strategy 
Nash  equilibrium. 

The  proof  of  this  theorem  doubles  as  a  backward-induction  algorithm  for  constructing  the  equi¬ 
librium.  Since  the  game  is  finite,  it  has  a  set  of  penultimate  nodes  -  i.e.,  nodes  whose  immediate 
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Figure  A.  1 :  Risk  of  Discovery  Game  Tree 


successors  are  terminal  nodes.  Specify  that  the  player  who  can  move  at  each  such  node  chooses 
whichever  strategy  leads  to  the  successive  terminal  node  with  the  highest  payoff  for  him.  Now 
specify  that  each  player  at  nodes  whose  immediate  successors  are  penultimate  nodes  choses  the 
action  that  maximizes  her  payoff  over  the  feasible  successors,  given  that  players  at  the  penultimate 
nodes  play  as  we  have  specified  in  the  previous  step.  We  can  now  roll  back  through  the  tree,  spec¬ 
ifying  actions  at  each  node.  When  we  are  done,  we  will  have  specified  a  strategy  for  each  player, 
and  it  is  easy  to  check  that  these  strategies  form  a  Nash  equilibrium. 


A.4.2  Strategic  Equilibria  -  Mixed  Strategies 

Since  it  is  unreasonable  to  expect  either  the  attacker  or  the  defender  to  have  complete  knowledge 
of  the  state  of  the  network  during  an  intrusion,  we  must  consider  strategic  equilibria  for  games 
of  imperfect  information.  Such  games  require  players  to  “mix”  their  strategies  to  come  out  ahead 
(poker  is  a  well-known  example).  Game  theory  devotes  much  research  to  so-called  mixed  strategies 
that  specify  a  probability  distribution  over  actions  at  each  state. 

Recall  that  extensive-form  games  model  incomplete  information  with  a  collection  of  player- 
specific  sets  of  information  sets  {Hi\i  E  I).  The  player  choosing  an  action  at  state  s  is  uncertain  if 
he  is  at  s  or  at  some  other  node  s'  E  h(s).  Let  6(A(h))  denote  the  space  of  all  possible  probability 
distributions  on  actions  available  at  information  set  h. 

A  behavior  strategy  for  player  i ,  denoted  bt,  is  an  element  of  the  Cartesian  product  Xhl^rr,b(A(h,l)). 
That  is,  a  behavior  strategy  specifies  a  probability  distribution  over  possible  actions  for  each  infor¬ 
mation  set  of  the  player.  The  probability  distributions  at  different  information  sets  are  independent. 

In  a  pure  strategy,  the  distribution  is  degenerate  in  each  information  set:  one  action  has  probability 
one,  and  the  rest  probability  zero.  If  we  specify  a  behavior  strategy  for  each  player,  we  get  a  strategy 
profile  b  =  (iq, . . . ,  bj),  which  generates  a  probability  distribution  over  outcomes  and  thus  gives 
rise  to  an  expected  payoff  for  each  player.  A  Nash  equilibrium  in  behavior  strategies  is  a  profile 
such  that  no  player  can  increase  his  expected  payoff  by  using  a  different  behavior  strategy. 


A.5.  DYNAMIC  ANALYSIS:  RISK  OF  DISCOVERY 
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A.5  Dynamic  Analysis:  Risk  of  Discovery 

Consider  a  game-theoretic  attack  model  with  the  game  tree  depicted  in  Figure  A.  1 .  The  attacker  (I) 
wants  to  get  at  a  database  on  a  protected  machine.  He  would  prefer  to  get  administrative  access  to  the 
machine  (payoff=3)  but  may  content  himself  with  reading  some  of  the  data  (payoff=2).  The  attacker 
has  available  two  alternative  approaches.  He  may  go  for  a  simple  one-step  penetration  (action  a\), 
perhaps  accessing  the  database  with  a  password  obtained  via  a  little  social  engineering.  Or  he  may 
choose  a  more  sophisticated  route,  running  a  vulnerability  scan  (s)  followed  by  an  appropriately- 
prepared  break-in  to  gain  full  control  of  the  target  database  (a-D-  The  former  approach  nets  his  some 
data  out  of  the  database,  but  the  latter  gives  him  a  shot  at  the  optimal  payoff  (3). 

The  snag  is  that  the  scanning  action  may  be  detected  by  the  defender,  who  may  opt  to  close  the 
firewall  (c)  and  prevent  action  a-2-  In  that  case  the  attacker  gets  nothing  (payoff  0).  Since  this  is  a 
game  of  perfect  information,  the  hack  ward- induct  ion  algorithm  is  applicable  and  yields  the  Nash 
strategy  profile  ((ai,a2),c).  Knowing  that  the  scan  will  alert  the  defender  and  lead  to  a  firewall 
closure,  the  attacker  must  reject  the  sophisticated  break-in  and  choose  the  lower-payoff  approach. 

Despite  its  simplicity,  the  example  highlights  one  of  the  major  insights  of  game  theory — that 
the  threat  of  counter-action  by  an  opposing  agent  has  a  fundamental  significance  in  the  construction 
of  an  optimal  strategy  for  any  game  player.  In  the  context  of  network  security  this  means  that  the 
utility  of  actively  vigilant  defense  (perhaps  involving  intrusion  detection  and  dynamic  adjustment  of 
security  settings)  goes  beyond  prevention  of  specific  attacks.  The  mere  threat  of  counter-measures 
is  sufficient  to  alter  attacker’s  strategy,  perhaps  channeling  it  in  less  destructive  directions. 


A.6  Dynamic  Network  Defense  Evaluation  Methodology 

The  impact  of  a  static  network  defense  measure  can  be  evaluated  by  including  the  measure  in  the 
model  of  the  network  and  re-running  the  security  analysis.  A  similar  approach  can  work  for  dynamic 
defense  solutions,  but  it  calls  for  a  game-theoretic  treatment  that  accounts  for  attacker’s  strategies 
designed  to  cope  with  the  active  defense.  In  general  outline  the  method  consists  of  the  following 
steps: 

1 .  Model  the  network  and  attacker  behavior  with  the  defensive  actions  included  as  full-fledged 
transitions  of  the  state  machine. 

2.  Run  attack  graph  generator  to  produce  the  game  tree  of  the  attack  model. 

3.  Apply  backward  induction  (or  other  appropriate  game-theoretic  algorithm)  to  compute  opti¬ 
mal  strategies  for  both  the  attacker  and  the  defender. 

The  outcomes  achieved  by  the  attacker  and  the  defender  with  their  respective  optimal  strategies  give 
an  indication  of  the  effectiveness  of  a  particular  dynamic  defense  mechanism.  As  a  side  benefit,  the 
analysis  produces  a  concrete  strategy  that  can  be  programmed  into  the  defensive  software. 
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Appendix  B 


Probabilistic  Scenario  Graphs 


When  we  arc  modeling  a  system  designed  to  operate  in  an  uncertain  environment,  certain  transitions 
in  the  model  represent  the  system's  reaction  to  changes  in  the  environment.  We  can  think  of  such 
transitions  as  being  outside  of  the  system’s  control — they  occur  when  triggered  by  the  environment. 
When  no  empirical  information  is  available  about  the  relative  likelihood  of  such  environment-driven 
transitions,  we  can  model  them  only  as  nondeterministic  “choices”  made  by  the  environment.  How¬ 
ever,  sometimes  we  have  empirical  data  that  make  it  possible  to  assign  probabilities  to  environment- 
driven  transitions.  We  would  like  to  take  advantage  of  such  information  to  quantify  the  probabilistic 
behavior  of  scenario  graphs.  In  this  section  we  show  how  to  incorporate  probabilities  into  scenario 
graphs  and  demonstrate  a  correspondence  between  the  resulting  construction  and  Markov  Decision 
Processes  [3]  (MDPs).  The  correspondence  lets  us  take  advantage  of  the  existing  body  of  algorithms 
for  analyzing  MDPs. 

One  way  to  incorporate  probabilities  into  scenario  graphs  is  to  choose  a  subset  of  states  and 
make  transitions  out  of  those  states  probabilistic.  Suppose  that  the  graph  has  a  state  s  with  only 
two  outgoing  transitions.  In  a  regular  scenario  graph  the  choice  of  which  transition  to  take  when 
the  system  is  in  s  is  nondeterministic.  However,  we  may  have  some  empirical  data  that  enables  us 
to  estimate  that  whenever  the  system  is  in  state  s,  on  average  it  will  take  one  of  the  transitions  four 
times  out  of  ten  and  the  other  transition  the  six  remaining  times.  We  can  place  probabilities  0.4  and 
0.6  on  the  corresponding  edges  in  the  scenario  graph.  We  call  a  state  with  known  probabilities  for 
outgoing  transitions  probabilistic.  When  we  have  assigned  all  known  probabilities  in  this  way,  we 
are  left  with  a  scenario  graph  that  has  some  probabilistic  and  some  nondeterministic  states  in  it.  We 
call  such  mixed  scenario  graphs  probabilistic  scenario  graphs. 


B.l  Probabilistic  Scenario  Graphs 

For  the  rest  of  the  appendix,  we  assume  that  the  transition  relation  r  is  total  in  both  the  Biichi 
model  and  the  scenario  graph  in  question.  Since  the  scenario  graph  includes  only  those  scenarios 
that  violate  the  correctness  property,  it  excludes  some  states  and  transitions  that  exist  in  the  input 
Biichi  model  M .  These  excluded  transitions  can  have  non-zero  probability,  so  that  in  their  absence 
the  sum  of  probabilities  of  transitions  from  a  probabilistic  state  will  be  less  than  1 .  To  address  this 
problem,  we  add  a  catch-all  escape  state  se  to  the  scenario  graph  to  model  the  excluded  part  of  M . 
The  escape  state  is  neither  final  nor  accepting.  A  probabilistic  state  s  in  the  scenario  graph  will  have 
a  transition  to  se  if  and  only  if  in  M  there  is  a  transition  from  s  to  some  state  that  does  not  appeal-  in 
the  scenario  graph.  The  probability  of  going  from  s  to  se  is  computed  by  subtracting  from  one  the 
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sum  of  the  probabilities  of  going  to  all  other  states  represented  in  the  scenario  graph.  There  arc  no 
transitions  out  of  se  except  a  self-loop  (preserving  the  totality  of  r). 

We  call  scenarios  terminating  in  se  escape  scenarios ,  to  distinguish  them  from  failing  scenarios. 
A  scenario  graph  augmented  with  an  escape  state  in  this  manner  still  accepts  only  the  set  of  failing 
scenarios;  however,  in  the  following  discussion  it  will  be  handy  to  have  a  name  for  escape  scenarios. 


Definition  B.l  A  probabilistic  scenario  graph  is  a  tuple  G  =  (Sn,  Sq ,  se,  S,  r,  n,  S  o,  Sa,  Sf,  D ), 
where  Sn  is  a  set  of  nondeterministic  states,  Sq  is  a  set  of  probabilistic  states,  se  G  Sn  is  a  nonde- 
terministic  escape  state  (se  fi  Sa  and  se  f  Sf ),  S  =  Sn  U  Sq  is  the  set  of  all  states,  r  C  S  x  Ax  S  is 
a  transition  relation,  tt  :  Sq  — ►  S  — *  3ft  are  transition  probabilities,  S  o  C  S  is  a  set  of  initial  states, 
Sa  C  S  is  a  set  of  acceptance  states,  S  /  C  S  is  a  set  of  final  states,  and  D  :  S  — >  2a(A  =  2AP)  is 
a  labeling  of  states  with  sets  of  alphabet  letters. 

A  probabilistic  scenario  graph  (PSG)  distinguishes  between  nondeterministic  states  (set  Sn) 
and  probabilistic  states  (set  Sq).  The  sets  of  nondeterministic  and  probabilistic  states  arc  disjoint 
(Sn  n  Sq  =  0).  The  function  tt  specifies  probabilities  of  transitions  from  probabilistic  states,  so  that 
for  all  transitions  (si,  S2)  €  r  such  that  si  G  Sq,  we  have  Prob(si  — >  S2)  =  vr(si)(s2)  >  0.  Thus, 
n(s)  can  be  viewed  as  a  probability  distribution  on  next  states.  Intuitively,  when  the  system  is  in 
a  nondeterministic  state  sn,  we  have  no  information  about  the  relative  probabilities  of  the  possible 
next  transitions.  When  the  system  is  in  a  probabilistic  state  sq,  it  will  choose  the  next  state  according 
to  probability  distribution  tt (sq). 

A  common  objection  to  modeling  software  systems  with  probabilities  is  that  these  probabilities 
are  difficult  or  impossible  to  obtain,  and  nobody  has  a  reason  to  try.  These  questions  arc  legitimate, 
but  they  are  beyond  the  scope  of  this  work.  The  chicken-and-egg  cycle  must  be  broken  somehow  or 
other,  and  if  we  can  propose  compelling  ways  of  incorporating  probabilities  into  reliability  analysis 
of  systems,  that  may  provide  impetus  for  other  researchers  to  obtain  these  probabilities  in  practice. 

B.1.1  Alternating  Probabilistic  Scenario  Graphs  and  Markov  Decision  Processes 

In  this  section  we  show  that  probabilistic  scenario  graphs  arc  equivalent  to  Markov  Decision  Pro¬ 
cesses  (without  the  cost  function).  We  then  demonstrate  how  a  benefit  function  can  be  assigned  to 
probabilistic  scenario  graphs  such  that  standard  MDP  algorithms  compute  answers  to  interesting 
quantitative  questions  about  the  corresponding  PSG. 

Definition  B.2  [3]  A  Markov  Decision  Process  is  a  tuple  (X,  A,  V ,  c)  where 

•  X  is  a  finite  state  space.  Generic  notation  for  MDP  states  will  be  x,  y,  z. 

•  A  is  a  finite  set  of  actions.  A(x)  C  A  denotes  the  actions  that  are  available  at  state  x.  The 
set  1C  =  {(x,  a)  :  x  G  X,  a  G  A(x)}  is  the  set  of  state-action  pairs.  A  generic  notation  for  an 
action  will  be  a. 

•  V  :  X  X  A  X  X  are  the  transition  probabilities,'  thus,  ( x ,  a ,  /y )  ( also  written  as  Dxay)  Is  the 

probability  of  moving  from  state  x  to  y  if  action  a  is  chosen. 

•  c  :  K,  P  is  an  immediate  cost.  The  cost  may  be  equivalently  viewed  as  a  negative  benefit. 
We  will  freely  use  the  term  benefit  to  mean  the  negative  cost,  and  vice  versa. 
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An  execution  fragment  of  an  MDP  is  a  sequence  .x'o«  i  x  \  . . .  anxn  of  alternating  states  and  actions 
such  that  the  sequence  begins  and  ends  with  a  state,  and  for  all  0  <  k  <  n,  0  <  V(xk- i,  cik ,  xf)  < 
1. 

It  is  possible  to  convert  a  probabilistic  scenario  graph  into  an  MDP  such  that  the  behaviors  of 
the  PSG  and  the  MDP  arc  identical.  To  explain  the  conversion  procedure,  we  define  a  restricted 
kind  of  probabilistic  scenario  graph. 

Definition  B.3  An  alternating  probabilistic  scenario  graph  is  a  tuple  G  =  (Sn.  Sq.  se,  S,  rn,  Tq,  i r,  So, 
Sa,  Sf ,  D ),  where  Sn  is  a  set  of  nondeterministic  states,  Sq  is  a  set  of  probabilistic  states,  se  E  Sn 
is  a  nondeterministic  escape  state,  S  =  Sn  U  Sq  is  the  set  of  all  states,  rn  C  Sn  x  Sq  is  a  set  of 
nondeterministic  transitions,  rq  C  Sq  X  Sn  is  a  set  of  probabilistic  transitions,  tt  :  Sq  — ►  Sn  — >  3ft 
are  transition  probabilities,  So  C  S  is  a  set  of  initial  states,  Sa  C  S  is  a  set  of  acceptance  states, 
Sf  C  S  is  a  set  of  final  states,  and  D  :  S  — »  A(A  =  2AP)  is  is  a  labeling  of  states  with  sets  of 
alphabet  letters. 

An  alternating  probabilistic  scenario  graph  (APSG)  does  not  have  any  transitions  between  two  con¬ 
secutive  nondeterministic  or  two  consecutive  probabilistic  states.  In  other  words,  a  nondeterministic 
state  has  transitions  to  probabilistic  states  only,  and  vice  versa.  An  execution  of  an  APSG  will  al¬ 
ways  have  strictly  alternating  nondeterministic  and  probabilistic  states. 

The  algorithm  shown  in  Figure  B.l  converts  a  PSG  Gp  =  (Sn,  Sq ,  se,  S,  r,  it,  So,  Sa .  Sf,  D ) 
into  an  APSG  G'p  =  ( S'n ,  S’q,  se,  S,  rn,  rq,  tt1,  So,  Sa,  Sf,  D')  that  has  equivalent  behaviors.  The 
algorithm  works  by  adding  hidden  states  and  transitions  to  the  graph  such  that  every  execution 
becomes  strictly  alternating,  yet  does  not  change  its  observable  (non-hidden)  components. 

We  start  with  S'n  =  Sn,  S'q  =  Sq,  r'  :=  0,  rq  :=  0,  tt'  :=  0.0,  and  D'  =  D.  Next, 

1.  Whenever  r  has  a  transition  from  probabilistic  state  si  to  nondeterministic  state  S2,  we  add 
the  transition  to  rp  and  its  probability  to  id. 

2.  Whenever  r  has  a  transition  from  nondeterministic  state  si  to  probabilistic  state  s 2,  we  add 
the  transition  to  rn. 

3.  Whenever  r  has  a  transition  between  two  nondeterministic  states  si  and  s 2,  we  add  a  hidden 
probabilistic  state  Sh  to  Sq,  an  observable  transition  .sq  — >  Sf,  to  r„,  and  a  hidden  transition 
Sh  S2  to  Tp,  assigning  the  latter  probability  1  in  7 d  (Figure  B.2a).  We  also  set  D\sh )  = 

D(s  t). 

4.  Whenever  r  has  a  transition  between  two  probabilistic  states  si  and  S2,  we  add  a  hidden 
nondeterministic  state  Sh  to  S'n,  a  hidden  transition  .s /,  — >  S2  to  rn,  and  an  observable  tran¬ 
sition  si  — >  Sh  to  Tp,  assigning  the  latter  the  original  probability  p  of  going  from  si  to  S2 
(Figure  B.2b).  We  also  set  Dfsf)  =  D(si). 

Let  e  be  an  execution  of  G  ' .  We  define  obs(e)  =  eobs  by  removing  hidden  states  and  hidden 
transitions  from  e. 

Lemma  B.l  If  e  is  an  execution  of  G  ’ ,  then  eobs  is  an  execution  ofG. 


Proof: 
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Input: 

G  =  (Sn,  Sq,  se,  S,  r,  7T,  SQ,  Sa,  Sf ,  D)  -  a  PSG 

Output: 

G'  =  ( S'n ,  5;,  se,  Tn,  Tq,  1 r',  50, Sa,  5/,  £>')  -  an  APSG 

Algorithm:  PSG-to-APSG(Sn,  Sq,  se,  S,  r,  tt,  S0,  Sa,  Sf,  D) 

Qf  . _  O  .  Qf  . _  Q 

On  • —  . —  Oq 

Tn  ■=  0;  Tg  :=  01 

$n)  ■  T  ( Sq,Sn )  . —  0.0, 

D'  :=  D 

foreach  transition  (si,  D(s\),  S2)  in  r  do 

if  si  is  probabilistic,  S2  is  nondeterministic  then 
Tp  :=  Tq  U  (si,  S2);  vr,(si)(s2)  :=  vr(si)(s2) 
elseif  si  is  nondeterministic,  s2  is  probabilistic  then 

Tn  ■=  Tn  U  (S1,S2) 

elseif  both  si  and  s2  arc  nondeterministic  then 
create  hidden  probabilistic  state  Sh 
create  observable  transition  (si,  s/,) 
create  hidden  transition  (s^,  s2) 

Tn  :=  Tn  U  (si,  Sh) 

Tq  :=  Tq  U  (sh,  S2) 

T '(Sh)(s2)  ■=  1 
S'q  :=  Sq  U  Sh 

D\sh )  :=  £>(si) 

elseif  both  si  and  s2  arc  probabilistic  then 
create  hidden  nondeterministic  state  s/, 
create  observable  transition  (si,  s/,) 
create  hidden  transition  (s^,  s2) 

Tn  :=  Tn  U  (s/j,  S2) 
r9  :=  Tq  U  (s1;  Sh) 
ir'(si)(sh)  ■=  7r(si)(s2) 

S'n  ■=  S'n  U  Sh 
D'(sh)  ■■=  D(S  1) 
end 

end 


//  Cose  1 
//  Case  2 
//  Case  3 


//  Case  4 


Figure  B.l:  Converting  PSG  to  APSG 
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Figure  B.2:  Converting  PSG  to  APSG 

e  can  be  represented  as  eor\h\t\e\  ■  ■  ■  en_i  rnhntnen,  where  ho,. . hn  are  hidden  states,  ro, . . . ,  rn 
are  transitions  into  hidden  states,  to, ,  tn  are  transitions  out  of  hidden  states,  and  eo,  •  •  • ,  en  are 
executions  consisting  only  of  observable  states. 

We  observe  that  eo, . . . ,  en+i  is  also  an  execution  in  G.  We  can  see  that  this  is  so  by  inspecting 
how  the  algorithm  adds  observable  states  and  transitions  to  G  .  All  observable  states  are  added  from 
G  at  initialization,  so  that  each  observable  state  in  C  also  occurs  in  G.  All  transitions  between  two 
observable  states  are  added  by  cases  1  or  2  of  the  algorithm  in  Figure  B.l,  so  that  each  such  transition 
also  occurs  in  G. 

The  proof  proceeds  by  induction  on  n.  For  n  =  0,  e  =  eobs  =  eo,  which  is  an  execution 
fragment  of  G.  For  n  =  k,  e  =  eprkhktkek,  where  the  prefix  fragment  ep  is  tor  i  h  \  1 \  e  i  . . .  e/,._  i . 
There  arc  two  cases: 

Case  1:  hk  is  a  hidden  probabilistic  state.  /;/.  was  added  to  G  by  case  3  of  the  algorithm.  So  tj.  is  a 
hidden  transition,  rj.  is  an  observable  transition,  and  eobs  = 

ep,s  is  an  execution  fragment  in  G.  By  the  induction  hypothesis,  e°bs  is  an  execution  in  G. 
Further,  by  the  precondition  of  case  3  of  the  algorithm  G  must  have  a  nondeterministic  transition 
between  the  last  state  of  e°bs  and  the  first  state  of  e^bs.  Therefore,  eobs  is  an  execution  in  G. 

Case  2:  hk  is  a  hidden  nondeterministic  state,  hj.  was  added  to  G  by  case  4  of  the  algorithm.  So  t-k 
is  a  hidden  transition,  is  an  observable  transition,  and  eobs  =  e°bsrke1bs. 

ep,s  is  an  execution  in  G.  By  the  induction  hypothesis,  epbs  is  an  execution  in  G.  Further,  by 
the  precondition  of  case  4  of  the  algorithm  in  Figure  B.l,  G  must  have  a  probabilistic  transition 
between  the  last  state  of  epbs  and  the  first  state  of  e'ps ,  with  the  same  probability  as  .  Therefore, 
eobs  is  an  execution  in  G. 


n 
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Lemma  B.2  obs  is  a  1-to-l  correspondence  between  successful  executions  of  G  '  and  G. 

Proof: 

onto  Let  e  be  a  successful  execution  in  G.  We  can  construct  a  pre-image  e /,  of  e  as  in  the  algorithm 
for  converting  PSGs  to  APSGs.  Replace  each  transition  between  two  consequent  nondeterministic 
states  with  two  transitions,  to  and  from  a  hidden  probabilistic  state  (Figure  B.2a).  Replace  each 
transition  between  two  consequent  probabilistic  states  with  two  transitions,  to  and  from  a  hidden 
nondeterministic  state  (Figure  B.2b).  It  is  easy  to  prove  by  induction  that  the  resulting  sequence  e  /, 
is  a  successful  execution  in  G  ,  and  that  obs(eh)  =  e. 

1-to-l  The  inductive  proof  is  easier  if  we  rewrite  subscripts  as  follows:  e  h  =  ■sot  i  *2  . . .  tn-isn. 
To  prove  that  obs  is  1-to-l,  suppose  there  is  an  execution  e'h  =  s'Qt\ s'2  ■  ■  ■  t'n_1s'n  in  G  such  that 
obs(e'h)  =  e.  We  show  that  e'h  =  eh  by  induction  on  subscript  n. 

Letting  n  =  0,  we  have  e  h.  =  so  and  e'h  =  Sq.  so  and  s'0  are  both  initial  states  in  G  ' ,  therefore 
they  are  observable.  We  have  e  =  obs(eh)  =  so  and  e  =  obs{e'h)  =  s’{y  So  so  =  s'0,  and  e/,  =  e'h. 

Letting  n  =  k  (where  k  >  0),  we  have  e/(  =  e^-i Xk,  where  xj~  is  a  state  when  k  is  even  and  a 
transition  when  k  is  odd,  and  e/,._  1  is  the  prefix  . . .  a:/;._  1 .  Similarly,  e'h  =  ef_  { x'k.  Note  that  by 
inductive  hypothesis  e'k_1  =  ek-i,  and  since  obs(e'h)  =  obs(eh),  we  must  have  obs(x'k )  =  obs(xk)- 
We  would  like  to  show  that  x’k  =  There  are  three  cases. 

Case  1:  x’k  is  a  state  (hidden  or  observable).  This  state  is  uniquely  determined  by  transition  t'k_v 
Since  by  inductive  hypothesis  t'k_1  =  tk-i,  x'k  must  be  the  same  state  as  x k- 

Case  2:  x'k  is  an  observable  transition.  Since  obs(x'k)  =  obsixk),  x'k  must  be  the  same  observable 
transition  as  Xk- 

Case  3:  x'k  is  an  hidden  transition.  By  construction  of  G  '  s'k_1  must  be  a  hidden  state.  By  inductive 
hypothesis  s'k_1  =  Sk- 1-  Since  a  hidden  state  has  exactly  one  outgoing  transition,  x'k  must  be  the 
same  transition  as  Xk- 

□ 


Definition  B.4  Define  the  behavior  of  a  scenario  graph  G  as  the  set  of  obsetyable  successful  exe¬ 
cutions  of  G:  Beh(G)  =  {e  :  3e'  G  L(G)  .  e  =  o&s(e/)}. 

Theorem  B.l  Beh(G')  =  Beh(G). 

Proof:  Follows  immediately  from  Lemma  B.2  □ 

We  have  shown  that  APSGs  have  the  same  expressive  power  as  PSGs,  so  hereafter  we  consider 
them  interchangeable. 

An  APSG  G  =  (Sn,  Sq .  se ,  S,  rn,  tp,  it,  Sq,  Sa,  Sf,  D)  has  a  direct  interpretation  as  an  MDP 
Mg  =  (X.,A,V,c).  Let  X  =  Sn,  A  =  rn.  That  is,  each  action  represents  a  transition  from  a 
nondeterministic  to  a  probabilistic  state.  Further,  let  x,y  G  X  and  a  G  A{x),  so  that  a  represents 
a  transition  from  x  to  some  probabilistic  state  sq  G  Sq  in  the  APSG.  Then  we  have  V(x,  a,  y)  = 
7r(sq)  (y).  For  now,  we  leave  the  cost  function  c  uninterpreted. 
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o  o  □ 

probabilistic  state  nondetermini  Stic  escape  state 
state 


Figure  B.3:  Converting  APSG  to  MDP 

It  is  preferable  to  have  all  APSG  acceptance  and  final  states  represented  explicitly  as  MDP 
states,  so  that  we  can  reason  about  failing  scenarios  in  the  MDP  context.  For  this  reason,  we  add  a 
hidden  nondeterministic  state  and  a  transition  to  it  to  every  probabilistic  fail  state  in  the  APSG.  We 
omit  proofs  of  equivalence  of  an  APSG  before  and  after  this  modification. 

Figure  B.3a  shows  an  example  APSG,  with  the  corresponding  MDP  shown  in  Figure  B.3b.  The 
nondeterministic  transitions  from  the  root  node  in  the  APSG  arc  represented  by  the  MDP  actions  a, 
b,  and  c.  The  leftmost  leaf  in  the  APSG  is  a  probabilistic  final  state;  in  the  MDP  it  is  represented  by 
the  appended  hidden  nondeterministic  state. 

APSG  components  ,S'o ,  S'a .  Sj ,  D  play  no  role  in  the  construction  of  the  MDP;  they  do,  however, 
play  a  role  in  our  interpretation  of  results  obtained  through  MDP  algorithms.  Finally,  we  can  choose 
the  cost  function  c  depending  on  the  questions  we  arc  trying  to  answer.  In  Section  B.2,  we  will  see 
how  particular  cost  assignments  can  help  us  compute  interesting  probabilities. 

Let  e  =  . . .  tnS ™  be  an  execution  of  APSG  G,  where  sk  and  tk  represent  non¬ 

deterministic  states  and  transitions,  respectively,  and  sk  and  tpk  represent  probabilistic  states  and 
transitions.  Let  mdp  :  executions  of  G  —>  executions  of  Mq  be  a  function  that  takes  an  execution 
e  of  APSG  G  and  strips  probabilistic  states  and  transitions,  leaving  a  sequence  of  nondeterministic 
states  and  transitions.  That  is,  if  e  =  . . .  t^s™,  then  mdp(e)  =  st  ■■  -  sn- 

Lemma  B.3  rndpie  )  is  an  execution  of  the  MDP  Me- 

Proof: 

mdp(e)  is  an  alternating  sequence  of  states  and  actions,  beginning  and  ending  in  a  state.  Let  0  < 
k  <  n.  By  construction,  'P(sk_1tksk)  =  n{sp)(sk)  >  0,  fulfilling  all  requirements  for  an  MDP 
execution. 

□ 


Lemma  B.4  mdp  is  a  1-to-l  correspondence  between  executions  of  G  that  begin  and  end  in  a 
nondeterministic  state  and  executions  of  Mq. 
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Figure  B.4:  Initial  Value  Assignments 


Proof: 

To  prove  that  mdp  is  onto,  let  em  =  Sq^i  si  •  •  •  s™  be  an  execution  of  Mq.  In  G  transition  t" 
ends  in  a  probabilistic  state.  Call  this  state  sp. 

Since  is  a  segment  of  execution  em,  we  must  have  'P(sgf”s”)  >  0.  So  in  G  we  have 

7r(si)(si )  >  0,  and  there  is  a  transition  tp  from  sp  to  s'(.  We  can  repeat  this  process  and  reconstruct 
spk  and  tJ!  for  all  0  <  /;:  <  n.  It  is  easy  to  see  that  e  =  Sg^i  si^isi  •  •  •  irU'”  is  an  execution  of  G  and 
em  =  mdp(e). 

To  prove  that  mdp  is  1-to-l,  let  e  and  e'  be  two  successful  executions  of  G  that  begin  and  end 
in  a  nondeterministic  state,  such  that  rndp(e)  =  mdp(e').  We  prove  that  e  =  e'  by  induction  on  the 
length  /  of  e  and  e' . 

When  l  =  1,  we  have  e  =  mdp(e)  =  so-  Since  mdp(e')  =  mdp(e),  we  must  also  have  e!  =  so- 

When  l  =  k  (l  >  1),  we  have  e  =  ek-iXk  and  e'  =  ek~vxrk.  By  inductive  hypothesis 
i  =  e'k  ! .  There  are  three  cases  to  consider: 

Case  1  Xk  is  a  nondeterministic  state  or  a  nondeterministic  transition.  Since  mdp{xk)  =  Xk  and 
mdp(x'k )  =  mdp(xk),  we  must  have  Xk  =  x'k. 

Case  2  .x/,.  is  a  probabilistic  state.  ;x/,:  is  uniquely  determined  by  nondeterministic  transition  tk  that 
immediately  precedes  it.  Since  by  inductive  hypothesis  tp  =  tpk,,  we  must  have  Xk  =  x'k. 

Case  3  x/,.  is  a  probabilistic  transition.  By  inductive  hypothesis  both  xp-  and  x'k  start  in  the  same 
probabilistic  state.  But  they  cannot  end  in  different  nondeterministic  states,  since  that  would  violate 
the  assumption  mdp(e)  =  mdp(e').  So  we  must  have  xj.  =  x'k. 

□ 


B.2  Probabilistic  Analysis 

Static  analysis  techniques  discussed  in  Chapter  9  take  a  binary  (yes/no)  approach  to  assessing  de¬ 
fense  effectiveness.  In  this  appendix  we  sketch  a  quantitative,  probabilistic  analysis  that  relies  on 
empirical  information  about  attack  frequencies  and  detection  probabilities.  The  analysis  uses  prob¬ 
abilistic  scenario  graph  ideas  presented  in  Appendix  B. 
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When  empirical  information  about  the  likelihood  of  certain  events  in  the  system  is  available, 
we  can  use  well-known  graph  algorithms  to  answer  quantitative  questions  about  an  attack  graph. 
Suppose  we  know  the  probabilities  of  some  transitions  in  the  attack  graph.  After  appropriately 
annotating  the  graph  with  these  probabilities,  we  interpret  it  as  a  Markov  Decision  Process,  as 
discussed  in  Section  B.  1. 1. 

The  standard  MDP  value  iteration  algorithm  [75]  computes  the  optimal  policy  for  selecting 
actions  in  an  MDP  that  results  in  maximum  benefit  (or  minimum  cost)  for  the  decision-maker. 
Value  iteration  can  compute  the  worst  case  probability  of  reaching  a  fail  state  in  a  probabilistic 
scenario  graph  as  follows.  We  assume  that  the  decision-maker’s  goal  is  to  maximize  the  probability 
of  reaching  a  fail  state,  and  assign  all  fail  states  the  benefit  value  of  1.  All  other  nodes  get  the 
benefit  value  of  0  (see  Figure  B.4  for  an  example).  Then  we  run  the  value  iteration  algorithm. 
The  algorithm  finds  the  optimal  transition  selection  policy  for  the  decision-maker  and  assigns  the 
expected  benefit  value  resulting  from  that  policy  to  each  state  in  the  scenario  graph.  The  expected 
value  is  a  fraction  of  1,  and  it  is  equivalent  to  the  probability  of  getting  to  a  fail  state  from  that  node, 
assuming  the  decision-maker  always  follows  the  optimal  policy.  In  the  example  in  Figure  B.4  the 
expected  benefit  at  the  initial  (top)  state  will  come  out  to  be  1,  as  we  can  simply  take  action  a ,  which 
always  leads  to  a  bad  state. 

We  implemented  the  value  iteration  algorithm  as  a  scenario  graph  post-processor  and  ran  it  on 
a  modified  version  of  the  example  from  Chapter  10.  In  the  modified  example  each  action  has  both 
detectable  and  stealthy  valiants.  The  intruder  chooses  which  action  to  tty  next,  and  he  has  a  certain 
probability  of  picking  a  stealthy  or  a  detectable  valiant. 

In  this  setup,  the  value  iteration  algorithm  computes  the  probability  that  the  intruder  will  suc¬ 
ceed  without  raising  an  alarm,  and  his  best  attack  strategy.  The  system  administrator  can  use  this 
technique  to  evaluate  effectiveness  of  various  security  fixes.  For  instance,  it  is  possible  to  com¬ 
pute  the  probability  of  intruder  success  after  installing  an  additional  IDS  component  to  monitor  the 
network  traffic  between  hosts  Windows  and  Linux  and  after  installing  a  host-based  IDS  on  host 
Linux.  Looking  at  the  computed  probability  in  each  case  gives  an  indication  of  which  option  is 
more  effective. 

It  is  worthwhile  to  point  out  that  the  probabilistic  reliability  analysis  discussed  here  is  useful 
only  for  exhaustive  and  succinct  scenario  graphs.  In  the  absence  of  either  property  the  results 
lose  meaning.  If  the  graph  is  not  succinct,  it  has  spurious  states  which  invalidate  the  analysis 
methodology  itself;  if  the  graph  is  not  exhaustive,  the  analysis  will  miss  viable  executions  that 
violate  the  property  under  consideration  and  thus  underestimate  the  danger. 
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Appendix  C 

Converting  LTL  to  Biichi  Automata 


Traditional  automata-theoretic  model  checkers  convert  correctness  properties  specified  in  temporal 
logic  into  automata  that  accept  infinite  sequences  only.  In  this  appendix,  we  sketch  a  modification  to 
the  classic  algorithm  for  converting  LTL  formulas  [39]  to  Biichi  automata.  The  modified  algorithm 
outputs  Biichi  automata  that  accept  both  finite  and  infinite  executions. 


C.l  LTL  to  Biichi  Conversion  Algorithm 

An  LTL  formula  P  induces  language  C(P)  of  propositional  sequences  over  the  alphabet  A  =  2AP: 

C(P)  =  U  <E  U  A*  |  f  |=  P} 

Given  P,  our  goal  is  to  construct  a  Biichi  automaton  Np  such  that  C(Np)  =  C(P).  The  basic  algo¬ 
rithm  for  converting  LTL  formulas  to  Biichi  automata  is  due  to  Gerth,  Peled,  Vardi,  and  Wolper  [39]. 
They  define  LTL  semantics  and  generate  Biichi  automata  with  respect  to  infinite  executions  only. 
We  modify  the  algorithm  slightly  so  it  works  for  both  finite  and  infinite  executions.  Except  for  the 
changes  related  to  finite  executions,  our  presentation  follows  [39]. 

The  algorithm  builds  a  graph  that  defines  the  states  and  transitions  of  the  output  automaton.  The 
nodes  of  the  graph  are  labeled  by  sets  of  sub-formulas  of  the  input  formula.  The  algorithm  constructs 
the  nodes  by  decomposing  formulas  according  to  their  Boolean  structure  and  by  expanding  the 
temporal  operators  in  order  to  separate  what  has  to  be  true  immediately  from  what  has  to  be  true  in 
the  next  state  and  thereafter.  The  fundamental  identity  used  for  expanding  temporal  operators  is 

/i  U  ip  =  ip  V  (n  A  X(/x  U  ip)) 

Before  describing  the  graph  construction  algorithm,  we  introduce  the  data  structure  used  to  represent 
the  graph  nodes.  Throughout  the  description  we  refer  to  the  LTL  formula  P  that  is  being  converted 
to  a  Biichi  automaton. 

ID  is  the  unique  identifier  for  the  node. 

Incoming  is  the  list  of  predecessor  nodes. 

New  is  a  list  of  sub-formulas  of  P.  It  represents  the  set  of  temporal  formulas  that  must  hold  at  the 
current  state  and  have  not  yet  been  processed  by  the  algorithm. 
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Old  is  the  set  of  sub-formulas  of  P  that  must  hold  in  the  node  and  have  already  been  processed 
by  the  algorithm.  As  the  algorithm  processes  the  sub-formulas  that  must  hold  in  the  node,  it 
moves  them  from  New  to  Old ,  eventually  leaving  New  empty. 

Next  is  the  set  of  temporal  formulas  that  must  hold  in  all  states  that  arc  immediate  successors  of 
the  node. 

Father  is  the  field  that  will  contain  the  name  of  the  node  from  which  the  current  one  has  been  split. 
We  use  this  field  for  reasoning  about  the  correctness  of  the  algorithm,  and  it  is  not  important 
for  the  construction. 

We  keep  a  list  of  nodes  Nodes  Net  whose  construction  was  completed.  We  denote  the  field  New  of 
the  node  q  by  New(q),  etc. 

The  fields  Old,  New,  and  Next  describe  temporal  properties  of  execution  suffixes  that  start  in  the 
current  node.  The  sub-formulas  in  these  fields  contain  information  about  a  suffix  a 1  of  a  computa¬ 
tion  a  when  the  following  condition  holds: 

Invariant  C.l  a1  satisfies  all  of  the  sub-formulas  in  either  Old  or  New,  and  al+l  satisfies  all  of  the 
sub-formulas  in  Next. 

Without  loss  of  generality,  we  assume  that  the  formula  P  has  been  converted  into  negation 
normal  form:  the  formula  consists  of  temporal  operators  U,  R,  and  X,  and  propositional  operators 
V,  A,  and  -i.  In  negation  normal  form,  the  operator  -i  is  only  applied  to  propositional  variables. 

The  line  numbers  in  the  following  descriptions  refer  to  the  algorithm  that  appeal's  in  pseudo¬ 
code  in  Figure  C.l.  The  algorithm  starts  with  a  single  node  with  the  formula  P  in  its  New  field  and 
a  special  node  init  in  its  Incoming  set  (lines  2-3).  By  the  end  of  the  construction,  a  node  will  be 
initial  in  the  Biichi  automaton  if  and  only  if  it  contains  the  label  init  in  its  Incoming  set. 

The  algorithm  processes  a  node  q  by  checking  if  there  are  unprocessed  sub-formulas  left  in  the 
set  New(q)  (line  13).  If  not,  the  node  q  is  fully  processed  and  ready  to  be  added  to  the  list  of  nodes 
Nodes  Net.  If  there  is  already  a  node  in  Nodes  Net  with  the  same  sub-formulas  in  both  its  Old  and 
Next  fields  (line  14),  the  algorithm  merely  adds  the  incoming  edges  of  q  to  the  copy  that  already 
exists  (line  15). 

If  no  node  equivalent  to  q  already  exists  in  Nodes  Net,  then  the  algorithm  adds  q  to  Nodes  Net 
and  forms  a  new  successor  r  to  q  as  follows  (lines  17-18): 

•  There  is  initially  one  edge  from  q  to  the  new  node  r. 

•  The  set  New  of  r  is  initially  set  to  the  Next  field  of  q. 

•  The  sets  Old  and  Next  of  r  are  initially  empty. 

While  list  New(q)  is  not  empty,  the  algorithm  picks  a  formula  //  out  of  New(q)  (line  19)  and  either 
splits  or  modifies  the  node  q  according  to  the  main  operator  of  rj. 
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Input:  P  -  an  LTL  formula 

Output:  Generalized  Biichi  automaton  Np  =  (S,t,  So,Sa,  Sf,  D) 
datatype  graph_node  =  [Name:  string,  Father :  string.  Incoming :  set  of  string, 

New :  set  of  formula.  Old :  set  of  formula.  Next:  set  of  formula]; 

1  Algorithm:  LTLtoBuchi(P) 

2  Nodes  <—  expand([Aflme  <t=  Father  <t=  NewName(),  Incoming  <t=  {init}, 

3  New  <=  {P},  Old  <t=  0,  Next  <t=  0],  0)); 

4  S  <—  Nodes', 

5  r  <—  {(g,  gr)  |  g  G  Incoming(q')}', 

6  So  <—  {g  |  init  G  Incoming(q)}; 

7  for  each  subformula  r)  =  /i  U  ip  of  P,  let  5a  =  {g  |  t/>  £  Old(q)  V  /r  U  ip  £  Old(q)}; 

8  Sa  <—  { SJI  |  77  is  a  subformula  of  the  form  //  U  L}; 

9  <—  {g  |  for  each  subformula  71  U  ip  of  P,  either  ip  G  Old(q)  or  (iTJ  ip  £  Old(q)}', 

10  D  <—  \q  <E  S  .  {X  C  AP  \  (X  D  Old(q)  (T  AP)  A  (X  D  {r/  G  AP  |  ^77  G  OM(g)}  =  0)}; 

11  return(iVp  4-  (5,  r,  S0,Sa,Sf,  D))\ 

12  function  expand(g,  Nodes_Set) 

13  if  New(q)  =  0  then 

14  if  3r  G  Nodes  Net  such  that  Old(  r  )=Old(q)/'\  Next  ( r )  =Next  ( g )  then 

15  Incoming(r )  <—  Incoming(r)U  Incoming(q)', 

16  return(Ao<ic5'A,cf); 

17  else  return(expand(|/Vame  <t=  Father  <t=  NewName(),  Incoming  <t=  {IVamc(g)}, 

18  Pew  <t=  {Afotf(g)},  Old  <t=  0,  Aferi  <t=  0],  {glUAGdesAFf)); 

19  else  let  77  G  New(q)', 

20  New(q)  <—  New(q)  \  {77}; 

21  if  (77  =  v4)  V  (77  =  -1  A)  V  ( rj  =  T)  V  (77  =  F)  then 

22  if  (77  =  F)  V  (Neg(r7)  G  Old(q))  then  -  q  contains  a  contradiction 

23  retu  rn( Nodes  Net) ;  -  discard  q 

24  else  Old(q)  <—  Old(q)  U  {77}; 

25  return(expand(g.  Nodes  Net))', 

26  elseif  (77  =  74  U  ip)  V  (77  =  /iRt/<)  V  (77  =  /r  V  ip)  then 

27  r  1  <—  [TV«777^  <t=  NewName(),  Father  <t=  Name(q),  Incoming  <t=  Incoming(q), 

28  Pew  New(q)  U  ({Newl(77)}\  Old(q)), 

29  Old  <t=  Old(q)  U  {77},  Peri  -<=  Next(q)  U  {Next  1(77)}]; 

30  r2  <—  [Name  <t=  NewName(),  Father  4=  Name(q),  Incoming  4=  Incoming(q) , 

31  Pew  New(q)  U  ({New2(77)}\  Old(q)), 

32  (9M  4=  Old(q)  U  {77},  Next  4=  Next{q)\, 

33  return(expand(r2,  expand(ri,  Nodes-Set))); 

34  elseif  77  =  71  A  ip  then 

35  return(expand([TVfl77?£  4=  Name(q),  Father  4=  Father(q), 

36  Incoming  4=  Incoming (g).  New  4=  New(q)  U  ({//,  t/z}\  OId(q)), 

37  (9M  <t=  Old(q)  U  {77},  Peri  4=  Next(q)\,  Nodes  Net)); 

38  elseif  77  =  X  n  then 

39  return(  exp  a  nd([Name  4=  Name(q),  Father  4=  Father  (q), 

40  Incoming  4=  Incoming (q).  New  4=  New(q), 

41  (9M  <t=  Old(q)  U  {77},  Peri  <t=  Next(q)  U  {77}],  Nodes _Set))\ 

42  end  expand; 


Figure  C.l:  Converting  LTL  formulas  to  Biichi  Automata 
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•  i]  is  a  proposition  p,  the  negation  of  a  proposition  -> p,  or  a  boolean  constant  T  or  F.  If  77  is  F  or 
— 177  is  in  Old,  we  discard  the  node  since  it  contains  a  contradiction  (lines  22-23).  Otherwise, 
the  node  is  modified  by  placing  rj  into  Old  if  not  already  there. 

When  i]  is  not  a  proposition,  the  node  q  can  be  split  into  two  (lines  26-33)  or  modified  in-place 
(lines  34-41).  The  exact  actions  depend  on  the  form  on  p: 

•  rj  =  pU  ip.  Because  p  U  ip  is  equivalent  to  ip  V  (p  A  X(p  U  ip)),  the  node  q  is  split  into 
two  nodes  rq  and  r2.  In  node  rq,  p  is  added  to  New  and  p  U  ip  to  Next.  In  node  r 2,  ip  is  added 
to  New. 

•  p  =  p  R  ip.  pR  ip  is  equivalent  to  ip  A  ( p  V  X(p  R  ip)),  which  in  turn  is  equivalent  to 
(ip  A  p)  V  (ip  A  X(p  Rip)).  So  the  node  is  split  into  two  nodes  rq  and  rq,  ip  is  added  to 
New  of  both  rq  and  rq,  p  is  added  to  New  of  rq,  and  p  R  ip  is  added  to  Next  of  r 2. 

•  p  =  p  V  ip.  The  node  is  split,  with  p  added  to  New  of  rq  and  ip  added  to  New  of  rq . 

•  p  =  p  A  ip.  The  node  is  modified,  with  p  and  ip  both  added  to  New. 

•  p  =  X  p.  The  node  is  modified,  with  p  added  to  Next. 

In  the  algorithm  listing,  the  function  NewName  ( )  generates  a  new  string  for  each  successive 
call.  The  symbol  A  stands  for  a  proposition  in  AP.  The  function  Neg(q)  is  defined  as  follows: 
Neg(2l)  =  ->A,  Neg(-iyl)  =  A,  Neg(T)  =  F,  Neg(F)  =  T.  The  functions  Newl(?7),  New2 (77), 
Next  1(77)  are  defined  in  the  following  table: 


V 

Newl  (77) 

Next  l(p) 

New2(r7) 

pU  ip 

{pUiP} 

M 

pRip 

M 

{p  R  V’} 

{/aY>} 

pV  ip 

0 

The  list  of  nodes  constructed  by  the  above  procedure  can  be  converted  to  a  generalized  B  iichi  au¬ 
tomaton  with  the  following  components: 

•  The  alphabet  is  2AP . 

•  The  set  of  states  S  is  the  set  nodes  generated  by  the  algorithm  (line  4). 

•  (<L  Ql)  €  t  if  and  only  if  q  £  Incoming(q')  (line  5). 

•  The  initial  states  arc  those  that  have  init  in  their  incoming  set  (line  6). 

•  The  acceptance  set  Sa  (lines  7-8)  contains  a  separate  set  of  states  Sp  for  each  subformula  of 
the  form  p  U  ip',  Sp  contains  all  the  states  q  such  that  either  ip  £  Old(q)  or  p  U  ip  (p  Old(q). 
Thus,  Sla  guarantees  that  if  p  U  ip  holds  at  some  state  in  some  accepting  run,  then  ip  must 
hold  later  on  the  same  run. 

•  The  set  of  final  states  S /  contains  all  states  that  do  not  have  unfulfilled  eventuality  obligations 
(line  9).  Specifically,  q  £  Sf  if  and  only  if  for  each  subformula  of  P  of  the  form  p  U  ip,  we 
have  ip  £  Old(q)  or  pU  ip  (jP  Old(q). 
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C.2  Correctness  of  LTLToBiichi 

In  this  section  we  sketch  elements  of  a  correctness  proof  proof  for  the  LTLToBiichi  algorithm. 
The  proof  is  incomplete;  in  particular,  Lemma  C.9  has  not  been  proven  for  finite  sequences. 

Let  A (q)  denote  the  value  of  Old(q)  at  the  point  where  the  construction  of  the  node  q  is  finished, 
i.e.  where  q  is  added  to  Nodes  Net,  at  line  25  of  the  algorithm.  Let  /\  T  denote  the  conjunction  of 
a  set  of  formulas  where  the  conjunction  of  an  empty  set  is  equal  to  T.  Let  Np  be  the  output 
generalized  Buchi  automaton  of  LTLToBiichi. 

Recall  that  a  propositional  sequence  £  =  xqxi  ...  is  a  sequence  in  (2AP)*  U  (2AP)U .  Let 
a  =  qoqi ...  be  a  sequence  of  states  of  Np  such  that  for  each  i  >  0,  (q*,  (/;+i)  G  Tp.  As  before,  £* 
denotes  the  suffix  XiXi+\Xi+2  ...  of  the  sequence  £. 

Lemma  C.l  Let  a  =  q^q\ . . .  be  a  path  in  Np,  and  let  /i  U  w  G  A(qo)-  Then  one  of  the  following 
holds: 

1.  V  0  <  i  <  |a|  :  p,  G  A (qf)  and  p  U  G  A (qf)  and  ^  ^  A (qf). 

2-  3  0  <  j  <  |a|  s.t.  V  0  <  i  <  j  :  p  G  A (qt)  and  p,  U  f  G  A [qf)  and  €  A (q.f). 

Proof:  Follows  by  straightforward  induction  on  the  length  of  a. 

a 


Lemma  C.2  When  a  node  q  is  split  during  the  construction  in  lines  26-32  into  two  nodes  r  ±  and 
p),  the  following  LTL  identity  holds: 


(/yOld (q)  A  y^New (q)  A  X  f\ Next(g))  4A 

((/yOld(ri)  A  yyNew(ri)  A  X  f\ Next(ri))  V  (/y01d(?’2)  A  /yNew(r2)  A  X  Next(r2))) 


Similarly,  when  a  node  q  is  updated  to  become  a  new  node  q',  as  in  lines  34-41,  the  following  holds: 


(/yOld(q)  A  /yNew(q)  A  X Next(q))  4A  (f\ Old(q')  A  y^New (q)  A  Xy^Next (q)) 


Proof:  Follows  directly  from  inspecting  the  cases  at  lines  26-4 1  the  algorithm  and  the  definition  of 
LTL.  □ 

Using  the  field  Father  we  can  link  each  node  to  the  one  from  which  it  was  split.  Thereby  we  define 
an  ancestor  relation  R,  where  (p,  q)  G  R  iff  Father(q)  =  Name(p).  Let  R*  be  the  transitive  closure 
of  R.  We  call  a  node  p  rooted  if  (p,p)  G  R.  A  rooted  node  can  arise  in  one  of  the  following  two 
circumstances: 

1.  p  is  the  initial  node  with  which  the  search  starts  at  lines  2-3.  Then,  p  has  New(p)  =  {  F  ). 

2.  p  is  obtained  at  lines  17-18  from  some  other  node  q  whose  construction  is  finished.  Then, 
New(p)  is  set  to  New(q). 
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Lemma  C.3  Let  p  be  a  rooted  node,  and  qi,q2,  ■  ■  ■  ,Qn  be  oil  of  p’s  direct  descendants,  so  that 
(p,  qk)  £  Rfor  all  1  <  k  <  n.  Let  ’ll  be  the  set  in  formulas  in  New  (p)  when  p  is  first  created.  Let 
Next  (qk)  be  the  values  of  the  Next  field  for  qj.  at  the  end  of  the  construction.  Then,  the  following 
LTL  identity  holds: 


**  V  i/\A(qk)  A  X /\Next(qk)) 

1  <k<n 

Moreover,  if £  j=  Vi<fc<n(A  A( )  A  X  /\  Next  (qk)),  then  there  exists  some  1  <  k  <  n  such  that 
£  |=  /\  A  (qk)  A  X  /\  Next(gfc)  and  for  each  pU  £  A  (qi~)  with  £  j=  fi,  is  also  in  A  (qk)- 

Proof:  By  induction  on  the  construction,  using  Lemma  C.2  □ 

Lemma  C.4  Let  qbe  a  node  and  £  a  propositional  sequence  such  that  £  |=  /\  A  (q)  Al/\  Next(g). 
Let  T  =  {fi  |  p  U  £  A (q)  A  fi  A (q)  A  (A  |=  fi}.  Then  there  exists  a  transition  (q,  q')  such 
that  A  |=  f\  A(q')  A  X  /\  Next(g')  and  T  C  A(q'). 

Proof: 

When  the  construction  of  node  q  is  finished,  the  algorithm  generates  a  node  r  with  New(r)  = 
Next(q)  =  ’h  (line  10).  Lemma  C.3  then  guarantees  the  existence  of  the  required  successor. 

□ 


Lemma  C.5  For  every  initial  state  q  £  So  of  the  automaton  Np,  we  have  P  £  A(q). 

Proof:  Follows  immediately  by  construction.  □ 

Lemma  C.6  For  an  automaton  Np  constructed  from  property  P,  the  following  holds: 

p v  (AA(?) A  *ANext(?))- 

q£S0 

Proof:  From  Lemma  C.3,  since  ’F  in  that  lemma  is  initially  { P} .  □ 

Lemma  C.7  Let  a  =  (juq\q:> . . .  be  an  execution  of  Mp  that  accepts  the  propositional  sequence  £. 
Then  £  |=  A  A(s0). 

Proof: 

By  induction  on  the  size  of  the  formulas  in  A(so)-  The  base  case  if  for  formulas  of  the  form 
A,  -i A,  where  A  £  AP.  We  will  show  only  the  case  p  U  £  A(r/o);  the  other  cases  are  treated 
similarly.  According  to  Lemma  C.  1  there  are  two  cases: 

1.  V  0  <  i  <  |a|  :  p  £  A (qf)  and  p  U  £  A (qf)  and  A  i  A {qf). 

2.  3  0  <  j  <  |a|  s.t.  VO  <i<j:pF  A(qi)  and  p  U  £  A  (qf)  and  £  A  (qj). 

If  a  is  finite,  its  terminal  state  satisfies  the  final  state  conditions  of  Np  constructed  at  line  9.  Those 
conditions  rule  out  case  1.  If  cr  is  infinite,  it  satisfies  the  acceptance  conditions  of  Np  constructed 
at  lines  7-8.  Those  conditions  also  rule  out  case  1.  Therefore,  only  case  2  is  possible.  Then,  by  the 
induction  hypothesis  £  \=  and  for  each  0  <  i  <  j,  £l  (=  p.  Thus,  by  the  semantic  definition  of 
LTL,  £\=  pUfi.  □ 
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Lemma  C.8  Let  a  =  qoqiq2  ■  ■  ■  be  an  execution  of  the  automaton  Np,  constructed  for  P,  that 
accepts  the  propositional  sequence  £.  Then  £  |=  P. 

Proof:  The  node  q o  is  an  initial  state.  From  Lemma  C.7  it  follows  that  £  |=  f\  A(so)  •  By  Lemma  C.5 
we  also  have  P  £  A(so).  Therefore,  £  |  =  P.  □ 

Lemma  C.9  Let  £  |=  P.  Then  there  exists  an  execution  a  of  Np  that  accepts  £. 

Proof: 

By  Lemma  C.6  there  exists  a  node  go  £  So  such  that  £  |=  f\  A(go)  A  X  /\  Next(qo).  We 
construct  the  desired  execution  a  =  gogig2  •  • .  by  repeated  application  of  Lemma  C.4,  starting  at 
node  go-  If  £*  (=  f\  A(g,)  A  X  /\  Nextiqf,  then  choose  ql+\  according  to  Lemma  C.4  as  a  successor 
of  qi  that  satisfies  two  conditions: 

(Cl)  £*+1  |=  AA(gi+i)  A  X  f\Next(qi+i),  and 

(C2)  for  any  subformula  p  U))£  A  (g*),  if  0  ^  A  (gj)  and  £*+1  |=  0,  then  0  6  A(gj+i). 

This  procedure  gives  an  inductive  definition  for  a.  At  each  step  of  this  procedure  we  pick  a  state 
qi+ 1  that  (by  condition  1)  satisfies  the  requirements  of  Lemma  C.4,  so  the  definition  of  a  is  well- 
formed.  It  remains  to  show  that  a  is  accepted  by  Np. 

Suppose  that  £  and  a  arc  infinite.  Np  accepts  a  if  at  least  one  member  of  each  set  of  acceptance 
states  computed  at  lines  7-8  occurs  infinitely  often  in  a.  So  let  r]  =  p  U  0  be  a  subformula  of  P, 
and  let  S%  =  {g  |  £  A(g)  V  p  U  f  ^  A(g)}  be  the  set  of  acceptance  states  associated  with  q 

on  line  7.  Further,  let  qi  be  any  state  in  a.  It  is  sufficient  to  show  that  there  exists  j  >  i  such  that 
qj  £  Sa . 

If  0  £  A(gj)  or  p  U  f  0  A(g*),  then  g.;  £  Sa,  so  assume  that  0  ^  A  (gj)  and  p  U  0  £  A  (gj). 
From  condition  (Cl)  we  know  that  £*  |=  /\  A(gt)  A  X  /\  Next( q, ) ,  so  £*  |=  p  U  0.  Therefore,  there 
must  be  some  minimal  j  >  i  such  that 

(Al)  e  |=  0 

By  Lemma  C.l  we  know  that  p  U  0  will  propagate  to  successors  of  qi  up  to  and  including  qj-\,  so 
we  have 

(A2)  /iU0£  A(gj_i) 

Now,  if  0  £  A(gj_i)  then  g^_ i  £  S'l  as  desired,  so  assume  that 
(A3)  0  i  A(gi_i) 

Assertions  (Al),  (A2),  and  (A3)  together  satisfy  the  requirements  of  condition  (C2),  so  we  get 
0  £  A  (gj)  and  therefore  q3  £  S'l,  as  desired. 

□ 

Theorem  C.l  The  automaton  Np  constructed  for  a  property  P  by  algorithm  LTLToBuchi  ac¬ 
cepts  exactly  the  executions  over  (2AP)*  U  ( 2AP)W  that  satisfy  P. 


Proof:  Follows  directly  from  Lemmas  C.8  and  C.9. 


□ 


APPENDIX  C.  CONVERTING  LTL  TO  B  UCHI  AUTOMATA 


Bibliography 


[1]  Computer  Oracle  and  Password  System  (COPS),  ftp.cert.org/pub/tools/cops. 

[2]  S.  Aggarwal,  R.  P.  Kurshan,  and  K.  Sabnani.  A  calculus  for  protocol  specification  and  vali¬ 
dation.  In  H.  Rudin  and  C.  H.  West,  editors.  Protocol  Specification,  Testing  and  Verification, 
pages  19-34.  North  Holland,  1983. 

[3]  Eitan  Altman.  Constrained  Markov  Decision  Processes.  Chapman  &  Hall/CRC,  1999. 

[4]  Paul  Animann,  Duminda  Wijesekera,  and  Saket  Kaushik.  Scalable,  graph-based  network  vul¬ 
nerability  analysis.  In  9th  ACM  Conference  on  Computer  and  Communications  Security,  pages 
217-224,  2002. 

[5]  Network  Associates.  Cybercop  scanner,  www.nai.com/asp_set/products/tns/ccscannerintro.asp. 

[6]  G.  Ausiello,  A.  D’Atri,  and  M.  Protasi.  Structure  preserving  reductions  among  convex  opti¬ 
mization  problems.  Journal  of  Computational  System  Sciences,  21:136-153,  1980. 

[7]  R.  Baldwin.  Kuang:  Rule  based  security  checking.  Technical  report,  MIT  Lab  for  Computer 
Science,  May  1994. 

[8]  R.E.  Bryant.  Graph-based  algorithms  for  boolean  function  manipulation.  IEEE  Transactions 
on  Computers,  C-35(8):677— 691,  August  1986. 

[9]  F.  Buccafurri,  T.  Eiter,  G.  Gottlob,  and  N.  Leone.  On  actl  formulas  having  deterministic 
counterexamples,  1999. 

[10]  J.  R.  Buchi.  On  a  decision  method  in  restricted  second  order  arithmetic.  In  Proceedings  of  the 
International  Congress  on  Logic,  Method,  and  Philosophy  of  Sciences,  pages  1-12.  Stanford 
University  Press,  1962. 

[11]  J.  Burch,  E.  Clarke,  K.  McMillan,  D.  Dill,  and  L.  Hwang.  Symbolic  model  checking:  1020 
states  and  beyond.  Information  and  Computation ,  98(2):  142-170,  June  1992. 

[12]  J.  R.  Burch,  E.  M.  Clarke,  D.  Long,  K.  L.  McMillan,  and  D.  L.  Dill.  Symbolic  model  checking 
for  sequential  circuit  verification.  IEEE  Transations  on  Computer-Aided  Design  of  Integrated 
Circuits,  1 3(4) :40 1-424,  April  1994. 

[13]  J.  R.  Burch,  E.  M.  Clarke,  and  D.  E.  Long.  Symbolic  model  checking  with  partitioned  transi¬ 
tion  relations.  In  A.  Halaas  and  P.  B.  Denyer,  editors,  Proceedings  of  the  1991  International 
Conference  on  Very  Large  Scale  Integration,  August  1991. 

[14]  Shawn  Butler.  Security  Attribute  Evaluation  Method.  PhD  thesis,  Carnegie  Mellon  University, 
Computer  Science  Department,  Pittsburgh,  PA,  May  2003. 


123 


124 


BIBLIOGRAPHY 


[15]  T.  Cattel.  Modelization  and  verification  of  a  multiprocessor  realtime  os  kernel.  In  Proceedings 
of  the  7th  FORTE  Conference ,  pages  35-51,  Bern,  Switzerland,  1994. 

[16]  J.  Chaves.  Formal  methods  at  AT&T,  an  industrial  usage  report.  In  Proceedings  of  the  4th 
FORTE  Conference,  pages  83-90,  Sydney,  Australia,  1991. 

[17]  E.  M.  Clarke,  E.  A.  Emerson,  and  A.  P.  Sistla.  Automatic  verification  of  finite-state  concur¬ 
rent  systems  using  temporal  logic  specifications.  In  Proceedings  of  the  10th  Annual  ACM 
Symposium  on  Principles  of  Programming  Languages ,  January  1983. 

[18]  E.  M.  Clarke,  E.  A.  Emerson,  and  A.  P.  Sistla.  Automatic  verification  of  finite-state  concurrent 
systems  using  temporal  logic  specifications.  ACM  Transactions  on  Programming  Languages 
and  Systems,  8(2):244-263,  1986. 

[19]  Edmund  M.  Clarke,  Orna  Grumberg,  and  Doron  A.  Peled.  Model  Checking.  The  MIT  Press, 
Cambridge,  Massachusetts,  2000. 

[20]  Edmund  M  Clarke,  Somesh  Jha,  Yuan  Lu,  and  Helmut  Veith.  Tree-like  counterexamples  in 
model  checking.  In  IEEE  Symposium  on  Logic  in  Computer  Science  (LICS)  2002,  page  11, 
July  2002. 

[21]  R.  W.  Cleaveland,  J.  Parrow,  and  B.  Steffen.  The  concurrency  workbench.  In  J.  Sifakis, 
editor,  Proceedings  of  the  1 989  International  Workshop  on  Automatic  Verification  Methods 
for  Finite  State  Systems,  volume  407  of  Lecture  Notes  in  Computer  Science,  pages  24-37. 
Springer- Verlag,  1989. 

[22]  T.H.  Cormen,  C.E.  Leiserson,  and  R.L.  Rivest.  Introduction  to  Algorithms.  MIT  Press,  1985. 

[23]  Cotse.net.  Vulnerability  Scanners,  http://www.cotse.com/tools/vuln.htm. 

[24]  O.  Coudert,  C.  Berhet,  and  J.  C.  Madre.  Verification  of  synchronous  sequential  machines  based 
on  symbolic  execution.  In  J.  Sifakis,  editor.  Proceedings  of  the  1989  International  Workshop 
on  Automatic  Vefification  Methods  for  Finite  State  Systems,  volume  407  of  Lecture  Notes  in 
Computer  Science.  Springer- Verlag,  June  1989. 

[25]  Frederic  Cuppens  and  Alexandre  Miege.  Alert  correlation  in  a  cooperative  intrusion  detection 
framework.  In  23r,/  IEEE  Symposium  on  Security  and  Privacy,  pages  187-200,  May  2002. 

[26]  Frederic  Cuppens  and  Rodolphe  Ortalo.  Lambda:  A  language  to  model  a  database  for  detec¬ 
tion  of  attacks.  In  Proceedings  of  the  Third  International  Workshop  on  the  Recent  Advances  in 
Intrusion  Detection  (RAID),  number  1907  in  LNCS,  pages  197-216.  Springer- Verlag,  2000. 

[27]  M.  Dacier.  Towards  Quantitative  Evaluation  of  Computer  Security.  PhD  thesis,  Institut  Na¬ 
tional  Polytechnique  de  Toulouse,  December  1994. 

[28]  M.  Dacier,  Y.  Deswartes,  and  M.  Kaaniche.  Quantitative  assessment  of  operational  security 
models  and  tools.  Technical  Report  Research  Report  96493,  LAAS,  May  1996. 

[29]  J.  Dawkins,  C.  Campbell,  and  J.  Hale.  Modeling  network  attacks:  Extending  the  attack  tree 
paradigm.  In  Workshop  on  Statistical  and  Machine  Learning  Techniques  in  Computer  Intru¬ 
sion  Detection.  Johns  Hopkins  University,  June  2002. 

[30]  Renaud  Deraison.  Nessus  Scanner,  http://www.nessus.org. 


BIBLIOGRAPHY 


125 


[31]  E.M.  Clarke,  O.  Grumberg,  K.L.  McMillan,  and  X.  Zhao.  Efficient  Generation  of  Counterex¬ 
amples  and  Witnesses  in  Symbolic  Model  Checking.  In  32nd  Design  Automation  Conference 
(DAC  95),  pages  427-432,  San  Francisco,  CA,  USA,  1995. 

[32]  E.M. Clarke  and  E.A.Emerson.  Synthesis  of  synchronization  skeletons  for  branching  time 
temporal  logic.  In  Logic  of  Programs:  Workshop,  volume  131  of  Lecture  Notes  in  Computer 
Science.  Springer- Verlag,  May  1981. 

[33]  E.  A.  Emerson.  Branching  Time  Temporal  Logic  and  the  Design  of  Correct  Concurrent  Pro¬ 
grams.  PhD  thesis,  Harvard  University,  1981. 

[34]  E.  A.  Emerson  and  C.  L.  Lei.  Temporal  model  checking  under  generalized  fainess  constraints. 
In  Proceedings  of  the  18th  Hawaii  International  Conference  on  System  Sciences,  1985. 

[35]  E.  A.  Emerson  and  C.L.  Lei.  Modalities  for  model  checking:  Branching  time  strikes  back.  In 
Proceedings  of  the  12th  Annual  ACM  Symposium  on  Principles  of  Programming  Languages, 
pages  84-96.  ACM  Press,  January  1985. 

[36]  M.  Fujita,  H.  Tanaka,  and  T.  Moto-oka.  Logic  design  assistance  with  temporal  logic.  In  Pro¬ 
ceedings  of  the  IFIP  WG10.2  International  Conference  on  Hardware  Description  Languages 
and  their  Applications,  1985. 

[37]  Drew  Fundenberg  and  Jean  Tirole.  Game  Theory.  MIT  Press,  1991. 

[38]  D.  Gabbay,  A.  Pnueli,  S.  Shelah,  and  J.  Stavi.  On  the  temporal  analysis  of  fairness.  In  Proceed¬ 
ings  of  the  7th  ACM  Symposium  on  Principles  of  Programming  Languages,  pages  163-173. 
ACM,  1980. 

[39]  Rob  Gerth,  Doron  Peled,  Moshe  Y.  Vardi,  and  Pierre  Wolper.  Simple  on-the-fly  automatic 
verification  of  linear  temporal  logic.  In  Protocol  Specification  Testing  and  Verification,  pages 
3-18,  Warsaw,  Poland,  1995.  Chapman  &  Hall. 

[40]  P.  Godefroid  and  P.  Wolper.  A  partial  approach  to  model  checking.  In  Proceedings  of  the  6th 
Symposium  on  Logic  in  Computer  Sciece,  pages  406-415,  Amsterdam,  July  1991. 

[41]  R.  Hardin,  Z.  Har’El,  and  R.  P.  Kurshan.  COSPAN.  In  R.  Alur  and  T.  A.  Henzinger,  editors, 
Proceedings  of  the  1996  Workshop  on  Computer-Aided  Verification,  volume  1102  of  Lecture 
Notes  in  Computer  Science,  pages  423-427.  Springer- Verlag,  1996. 

[42]  Z.  Har’El  and  R.  P.  Kurshan.  Software  for  analytical  development  of  communications  proto¬ 
cols.  AT&T  Technical  Journal,  69(1):45— 59,  Jan.-Feb.  1990. 

[43]  R.  Hojati,  R.  K.  Brayton,  and  R.  P.  Kurshan.  Bdd-based  debugging  of  designs  using  language 
containment  and  fair  ctl.  In  C.  Courcoubetis,  editor.  Computer  Aided  Verification:  Proc.  of  the 
5th  International  Conference  CAV’93,  pages  41-58.  Springer,  Berlin,  Heidelberg,  1993. 

[44]  G.  Holzmann.  Design  and  Validation  of  Computer  Protocols.  Prentice-Hall  International 
Editions,  Englewood  Cliffs,  NJ,  1991. 

[45]  G.  Holzmann,  P.  Godefroid,  and  D.  Pirottin.  Coverage  preserving  reduction  strategies  for 
reachability  analysis.  In  Proceedings  of  the  12th  International  Symposium  on  Protocol  Speci¬ 
fication,  Testing,  and  Vefification,  Lake  Buena  Vista,  Florida,  June  1992.  North-Holland. 


126 


BIBLIOGRAPHY 


[46]  G.  J.  Holzmann.  Proving  the  value  of  formal  methods.  In  Proceedings  of  the  7th  FORTE 
Conference,  pages  385-396,  Bern,  Switzerland,  1994. 

[47]  G.  J.  Holzmann.  The  theory  and  practice  of  a  formal  method:  Newcore.  In  Proceedings  of  the 
13tli  IFIP  World  Computer  Congress,  Hamburg,  Germany,  1994. 

[48]  G.  J.  Holzmann.  An  analysis  of  bitstate  hashing.  In  Proceedings  of  the  15th  Int.  Confon  Pro¬ 
tocol  Specification,  Testing,  and  Verification,  INWG/IFIP,  pages  301-314,  Warsaw,  Poland, 
1995.  Chapman  &  Hall. 

[49]  G.  J.  Holzmann.  The  SPIN  Model  Checker:  Primer  and  Reference  Manual.  Pearson  Educa¬ 
tional,  1997. 

[50]  Gerard  J.  Holzmann.  An  improved  protocol  reachability  analysis  technique.  Software  -  Prac¬ 
tice  and  Experience,  18(2):  137—161,  1988. 

[51]  G.J.  Holzmann.  On  limits  and  possibilities  of  automated  protocol  analysis.  In  H.  Rudin 
and  C.  West,  editors,  Proceedings  of  the  6th  Int.  Confon  Protocol  Specification,  Testing,  and 
Verification,  INWG/IFIP ,  Zurich,  Sw.,  June  1987. 

[52]  Somesh  Jha,  Oleg  Sheyner,  and  Jeannette  M.  Wing.  Minimization  and  reliability  analyses 
of  attack  graphs.  Technical  Report  CMU-CS-02-109,  Carnegie  Mellon  University,  February 
2002. 

[53]  Somesh  Jha  and  Jeannette  Wing.  Survivability  analysis  of  networked  systems.  In  Proceedings 
of  the  International  Conference  on  Software  Engineering,  Toronto,  Canada,  May  2001. 

[54]  Arvind  Kannan,  Meera  Sridhar,  and  Jeannette  Wing.  Personal  communication,  January  2004. 

[55]  K.L.  McMillan.  Symbolic  Model  Checking.  Kluwer  Academic  Publishers,  Norwell  Mas¬ 
sachusetts,  1993. 

[56]  O.  Kupferman  and  O.  Grumberg.  Buy  one,  get  one  free! ! !  Journal  of  Logic  and  Computation, 
6(4):523-539,  1996. 

[57]  R.  P.  Kurshan.  Analysis  of  discrete  event  coordination.  In  J.  W.  de  Bakker,  W.-P.  de  Roever, 
and  G.  Rozenberg,  editors.  Proceedings  of  the  REX  Workshop  on  Stepwise  Refinement  of  Dis¬ 
tributed  Systems,  Models,  Formalisms,  Correctness,  volume  430  of  Lecture  Notes  in  Computer 
Science.  Springer- Verlag,  May  1989. 

[58]  R.  P.  Kurshan.  Computer-Aided  Verification  of  Coordinating  Processes:  The  Automata- 
Theoretic  Approach.  Princeton  University  Press,  1994. 

[59]  R.P  Kurshan.  Computer-Aided  Verification  of  Coordinating  Processes:  The  Automata- 
Tlieoretic  Approach.  Princeton  University  Press,  1994. 

[60]  O.  Lichtenstein  and  A.  Pnueli.  Checking  that  finite  state  concurrent  programs  satisfy  their 
lineal-  specification.  In  Proceedings  of  the  12th  Annual  ACM  Symposium  on  Principles  of 
Programming  Languages,  pages  97-107.  ACM  Press,  1985. 

[61]  F.J.  Lin.  Specification  and  validation  of  communications  in  client/server  models.  In  Proceed¬ 
ings  of  the  1994  Int.  Conference  on  Network  Protocols  ICNP,  pages  108-116,  Boston,  Mass., 
1994. 


BIBLIOGRAPHY 


127 


[62]  Kong-Wei  Lye  and  Jeannette  Wing.  Game  strategies  in  network  security.  In  Foundations  of 
Computer  Security  Workshop ,  July  2002. 

[63]  Monika  Maidl.  The  common  fragment  of  CTL  and  LTL.  In  IEEE  Symposium  on  Foundations 
of  Computer  Science ,  pages  643-652,  2000. 

[64]  Y.  Malachi  and  S.  S.  Owicki.  Temporal  specifications  of  self-timed  systems.  In  H.  T.  Kung, 
B.  Sproull,  and  G.  Steele,  editors,  VLSI  Systems  and  Computations.  Computer  Science  Press, 
1981. 

[65]  K.  McMillan.  Using  unfolidng  to  avoid  the  state  explosion  problem  in  the  verification  of 
asynchronous  circuits.  In  Proceedings  of  the  4th  Workshop  on  Computer  Aided  Verification, 
Montreal,  June  1992. 

[66]  D.  E.  Muller.  Infinite  sequences  and  finite  machines.  In  Proceedings  of  the  4th  IEEE  Sympo¬ 
sium  on  Switching  Circuit  Theory  and  Logical  Design,  pages  3-16,  1963. 

[67]  Peng  Ning,  Yun  Cui,  and  Douglas  S.  Reeves.  Constructing  attack  scenarios  through  correlation 
of  intrusion  alerts.  In  Proceedings  of  the  9th  ACM  conference  on  Computer  and  communica¬ 
tions  security,  pages  245-254.  ACM  Press,  2002. 

[68]  NuSMV.  NuSMV:  A  New  Symbolic  Model  Checker. 

http://afrodite.itc.it: 1024/ "nusmv/. 

[69]  R.  Ortalo,  Y.  Deswarte,  and  M.  Kaaniche.  Experimenting  with  quantitative  evaluation  tools  for 
monitoring  operational  security.  IEEE  Transactions  on  Software  Engineering,  25(5):633— 650, 
September/October  1999. 

[70]  R.  Peikert.  ec-rcgular  languages  and  propositional  temporal  logic.  Technical  Report  85-10, 
ETH,  Zurich,  1985. 

[71]  D.  Peled.  Combining  partial  order  reductions  with  on-the-fly  model  checking.  In  D.  L.  Dill, 
editor,  Proceedings  of  the  1994  Workshop  on  Computer  Aided  Verification,  volume  818  of 
Lecture  Notes  in  Computer  Science,  pages  377-390.  Springer,  1994. 

[72]  C.A.  Phillips  and  L.P  Swiler.  A  graph-based  system  for  network  vulnerability  analysis.  In 
New  Security  Paradigms  Workshop,  pages  71-79,  1998. 

[73]  C.  Pixley.  A  computational  theory  and  implementation  of  sequential  hardware  equivalence. 
In  R.  Kurshan  and  E.  Clarke,  editors.  Proceedings  ofCAV  Workshop,  Rutgers  University,  NJ, 
June  1990. 

[74]  Amir  Pnueli.  The  temporal  logic  of  programs.  In  18th  IEEE  Symposium  on  Foundations  of 
Computer  Science,  pages  46-57.  IEEE  Computer  Society  Press,  1977. 

[75]  M.  Puterman.  Markov  Decision  Processes.  John  Wiley  &  Sons,  New  York,  NY,  1994. 

[76]  J.  P.  Quielle  and  J.  Sifakis.  Specification  and  verification  of  concurrent  systems  in  CESAR.  In 
Proceedings  of  the  Fifth  International  Symposium  in  Programming,  1981. 

[77]  C.  Ramakrishnan  and  R.  Sekar.  Model-based  vulnerability  analysis  of  computer  systems.  In 
Proceedings  of  the  2nd  International  Workshop  on  Verification,  Model  Checking  and  Abstract 
Interpretation,  September  1998. 


128 


BIBLIOGRAPHY 


[78]  John  Ramsdell.  Frame  propagation.  MITRE  Corp.,  2001. 

[79]  R.W.  Ritchey  and  R  Ammann.  Using  model  checking  to  analyze  network  vulnerabilities.  In 
Proceedings  of  the  IEEE  Symposium  on  Security  and  Privacy,  pages  156-165,  May  2001. 

[80]  O.  Sheyner,  J.  Haines,  S.  Jha,  R.  Lippmann,  and  J.M.  Wing.  Automated  generation  and  analy¬ 
sis  of  attack  graphs.  In  Proceedings  of  IEEE  Symposium  on  Security  and  Privacy,  May  2002. 

[81]  A.  Sistla,  M.  Vardi,  and  P.  Wolper.  The  complementation  problem  for  buchi  automata  with 
applications  to  temporal  logic.  Theoretical  Computer  Science,  47:217-237 ,  1987. 

[82]  A.  P.  Sistla.  Theoretical  Issues  in  the  Design  and  Verification  of  Distributed  Systems.  PhD 
thesis,  Harvard  University,  1983. 

[83]  A.  P.  Sistla  and  E.  M.  Clarke.  Complexity  of  propositional  temporal  logics.  Journal  of  the 
ACM,  32(3):733-749,  1985. 

[84]  Petr  Slavik.  A  tight  analysis  of  the  greedy  algorithm  for  set  cover.  In  28th  ACM  Symposium 
on  Theory  of  Computing  ( STOC ),  pages  435-441,  1996. 

[85]  SMV.  SMV:  A  Symbolic  Model  Checker.  http:// 

www . cs . emu . edu/ 'modelcheck/. 

[86]  Peter  Stephenson.  Using  formal  methods  for  forensic  analysis  of  intrusion  events  -  a  pre¬ 
liminary  examination.  White  Paper,  available  at  http://www.imfgroup.com/Document  Li- 
brary.html. 

[87]  U.  Stern  and  D.  Dill.  Improved  probabilistic  verification  by  hash  compaction.  In  Proceedings 
of  the  IFIP  WG  10.5  Advanced  Research  Working  Conf.  on  Correct  Hardware  Design  and 
Verification  Methods,  pages  206-224,  1995. 

[88]  U.  Stern  and  D.  Dill.  A  new  scheme  for  memory-efficient  probabilistic  verification.  In  Pro¬ 
ceedings  of  the  IFIP  TC6AVG6.1  Joint  Int.  Conf.  on  Formal  Description  Techn.  for  Distr.  Sys¬ 
tems  and  Comm.  Protocols,  and  Protocol  Spec.,  Testing,  and  Verification,  FORTE/PSTV96, 
pages  333-348.  North-Holland  Publ.,  1996. 

[89]  L.P  Swiler,  C.  Phillips,  D.  Ellis,  and  S.  Chakerian.  Computer-attack  graph  generation  tool.  In 
Proceedings  of  the  DARPA  Information  Survivability  Conference  and  Exposition ,  June  2000. 

[90]  R.  E.  Tarjan.  Depth  first  search  and  linear  graph  algorithms.  SIAM  Journal  of  Computing, 
1(2):  146-160,  June  1972. 

[91]  Steven  Templeton  and  Karl  Levitt.  A  requires/provides  model  for  computer  attacks.  In  Pro¬ 
ceedings  of  the  New  Security  Paradigms  Workshop,  Cork,  Ireland,  2000. 

[92]  A.  Valmari.  A  stubborn  attack  on  state  explosion.  In  Proceedings  of  the  2nd  Workshop  on 
Computer-Aided  Verification,  volume  531  of  Lecture  Notes  in  Computer  Science,  pages  156— 
165,  Rutgers,  June  1990. 

[93]  Moshe  Vardi  and  Pierre  Wolper.  An  automata-theoretic  approach  to  automatic  program  veri¬ 
fication.  In  First  IEEE  Symp.  on  Logic  in  Computer  Science,  pages  322-331,  1986. 

[94]  G.  Vigna  and  R.A.  Kemmerer.  Netstat:  A  network-based  intrusion  detection  system.  Journal 
of  Computer  Security,  7(1),  1999. 


BIBLIOGRAPHY 


129 


[95]  G.  von  Bochmann.  Hardware  specification  with  temporal  logic:  An  example.  IEEE  Transac¬ 
tions  on  Computers,  C-31(3),  March  1982. 

[96]  Common  Vulnerabilities  and  Exposures,  http://www.cve.mitre.org. 

[97]  P.  Wolper,  M.  Y.  Vardi,  and  A.  P.  Sistla.  Reasoning  about  infinite  computation  paths.  In 
Proceedings  of  the  24th  IEEE  Symposium  on  Foundations  of  Computer  Science,  pages  1 85— 
194.  Tucson,  AZ,  1983. 

[98]  Pierre  Wolper  and  Denis  Leroy.  Reliable  hashing  without  collision  detection.  In  Computer- 
Aided  Verification,  volume  697  of  Lecture  Notes  in  Computer  Science,  pages  59-70.  Springer- 
Verlag,  June  1993. 

[99]  D.  Zerkle  and  K.  Levitt.  Netkuang  -  a  multi-host  configuration  vulnerability  checker.  In  6th 
USENIX  Unix  Security  Symposium,  San  Jose,  CA,  1996. 


Index 


acceptance 
cycle,  26 

state,  5,  6,  13,  34,  35 
action,  53,  59,  61,  71,  77,  82 
rule,  61,  74-76 
triple,  61 
ACTL,  50 
agent,  57,  59 
alarm,  60 

alert  correlation,  55 

alternating  probabilistic  scenario  graph,  107, 
110 

ANGI,  83,  85,  86 
approximation,  69 
approximation  ratio,  66,  67 
APSG,  see  alternating  probabilistic  scenario 
graph 

attack 

graph,  3,  53,  58,  77,  92,  101 
model,  57,  59,  63,  64,  81,  101 
scenario,  64,  77,  91 

Biichi 

automata  intersection,  24 
automaton,  5,  6,  115 
automaton  example,  7 
model,  7,  29 

characteristic 
f-prefix,  11 
i-prefix,  11 
logical  predicate,  13 
prefix,  12 

connectivity,  59,  60,  73,  82 
correctness  property,  5,  7,  20,  29 
counterexample,  50 
critical  host,  53 
critical  set 

of  actions,  65,  78 
of  measures,  65,  78 


CTL*,  15,  48 
CVE,  71,72,  77,  82,  84 

detectable  action,  6 1 
DFS,  35,39,41 
DMZ,  71 
dynamic 

defense,  63 
defense  measure,  64 

effect,  61,  91 
escape 

scenario,  106 
state,  105,  107 
example  network,  71 
exhaustive 

rs-automaton,  12,  19 
scenario  automaton,  8,  26,  29 
scenario  graph,  see  exhaustive  scenario 
automaton 

explicit-state  algorithm,  39,  81 
exploit,  see  action 
exposure  minimization,  64 

f„ -equivalence,  11 
fn -prefix,  11 

f-exhaustive  rs-automaton,  13,  31,  34 
f-prefix,  11,  12 
f-representative 
execution,  9 
scenario,  9,  29 
f-scenario,  12,  31 

failure  scenario  graph,  see  scenario  graph 

final  state,  5,  6,  31,  64 

finite 

execution,  6,  29 
scenario,  29 

forensic  analysis,  53,  55 

game-theoretic  model,  93,  99 
generalized  Biichi  automaton,  7 


130 


INDEX 


131 


greedy 

critical  set,  78 
measure  set,  78 

hash  table,  4 1 

hashcompact  mode,  42,  42,  50 
hashcompact  mode  tradeoff,  42 
host,  59,  60,  81 

in -equivalence,  11 
in-prefix,  11 

i-exhaustive  rs-automaton,  13,  35,  37 
i-prefix,  11,  12 
i-representative 
execution,  11 
scenario,  9,  34 
i-scenario,  12,  35 
IDS,  55,  59,  61,71,73,  82 
IIS  web  server,  7 1 
infinite 

execution,  6,  29 
scenario,  13,  29,  34 
initial 

component,  21 
state,  31,  35,  64 
intruder,  59,  61,  74,  82 
effect,  61,  74-76 
precondition,  61,  74-76 
intrusion 

defense,  53,  55 
detection,  53 

intrusion  detection  system,  see  IDS 

knowledge,  61,  74 

language,  6,  20,  49,  1 15 
LICQ,  71,72 

Lineal-  Temporal  Logic,  see  LTL 
liveness  property,  5,  8 
LTL,  17,48,50,  115 


minimization,  64,  65,  78 
minimum 

critical  set  of  actions,  see  MCSA 
critical  set  of  measures,  see  MCSM 
hitting  cover,  see  MHC 
hitting  set,  see  MHS 
set  cover,  see  MSC 
model  builder,  81 
model  checking,  17,  47 
MPC,  67 
MSC,  65-67,  69 

Nessus,  55,  83,  84,  91 
network 

attack  graph,  59 
attack  model,  57,  59 
effect,  61,  74-76 
exposure,  63,  64,  78 
precondition,  61,  74-76 
topology,  82 
nondeterministic 
state,  106,  107 
transition,  107 

Outpost,  83,  84 

path,  65 
plvl,  61,74-76 
port,  60,  84 
port  scan,  61,  72 
postcondition,  see  effect 
precondition,  60,  61,  91 
privilege,  61,  77 
probabilistic 

analysis,  112 

scenario  graph,  105, 106,  107 
state,  106 
transition,  105 

propositional  sequence,  6,  119 
PSG,  see  probabilistic  scenario  graph 


Markov  Decision  Process,  see  MDP 

MCSA,  65,  69 

MCSM,  65,  66,  67,  69 

MDP,  105, 106,  110,  113 

measure,  64 

memory,  4 1 ,  44 

MHC,  66,  67 

MHS,  66 


Q3,  22,  31,  35 
Qv,  22,  31,  35 

r-scenario,  12,  31,  35 
realizable  set,  65,  66 

representative  scenario  automaton,  see  rs-automaton 

rs-automaton,  12,  18,  29 

RSet 


132 


INDEX 


algorithms,  22,  31 
function,  22 

RSET- EXISTENTIAL,  24,  26 
RSET- UNIVERSAL,  23 

safety 

property,  5,  8,  13,  58 
scenario  graph,  13,  17,  58 
SCC,  31 

SCC  graph,  21,  22,  26,  29,  31,  35,  39 
scenario,  7 

automaton,  7,  8,  26 
graph,  3,  5,  7,  57 
graph  generator,  8 1 
security  property,  57,  58,  64,  77,  86 
service,  60,  61,  84 
set  of 

actions,  64 
measures,  64 
sound 

rs-automaton,  12,  18,  34,  37 
scenario  automaton,  8,  26,  29 
scenario  graph,  see  sound  scenario  au¬ 
tomaton 

Squid  proxy,  71,  72,  78 
static 

analysis,  112 
defense,  63,  64 
defense  measure,  63 
stealthy  action,  6 1 
strongly  connected  component,  2 1 
graph,  see  SCC  graph 
succinct 

rs-automaton,  19,  34,  37 
scenario  automaton,  8,  26,  29 
scenario  graph,  see  succinct  scenario  au¬ 
tomaton 

symbolic  algorithm,  17,  39 

temporal  logic,  15,  47 
toolkit,  81 

traceback  mode,  39,  42,  44 
traceback  mode  tradeoff,  44 
trust,  59,  61,  82 

value  iteration  algorithm,  113 


XML  format,  83,  86 


